All the AI photo forensics out there, can you actually tell that these paintings are AI generated? what am i missing here?
submitted by /u/Armand_Roulinn
[link] [comments]
arXiv:2311.18744v3 Announce Type: replace-cross
Abstract: This paper presents a comprehensive comparative analysis of the performance of Equivariant Quantum Neural Networks (EQNN) and Quantum Neural Networks (QNN), juxtaposed against their classical counterparts: Equivariant Neural Networks (ENN) and Deep Neural Networks (DNN). We evaluate the performance of each network with two toy examples for a binary classification task, focusing on model complexity (measured by the number of parameters) and the size of the training data set. Our results show that the $\mathbb{Z}_2\times \mathbb{Z}_2$ EQNN and the QNN provide superior performance for smaller parameter sets and modest training data samples.
( 2
min )
arXiv:2403.13812v1 Announce Type: cross
Abstract: Many people are interested in ChatGPT since it has become a prominent AIGC model that provides high-quality responses in various contexts, such as software development and maintenance. Misuse of ChatGPT might cause significant issues, particularly in public safety and education, despite its immense potential. The majority of researchers choose to publish their work on Arxiv. The effectiveness and originality of future work depend on the ability to detect AI components in such contributions. To address this need, this study will analyze a method that can see purposely manufactured content that academic organizations use to post on Arxiv. For this study, a dataset was created using physics, mathematics, and computer science articles. Using the newly built dataset, the following step is to put originality.ai through its paces. The statistical analysis shows that Originality.ai is very accurate, with a rate of 98%.
( 2
min )
arXiv:2403.14593v1 Announce Type: new
Abstract: Adversarial inverse reinforcement learning (AIRL) stands as a cornerstone approach in imitation learning. This paper rethinks the two different angles of AIRL: policy imitation and transferable reward recovery. We begin with substituting the built-in algorithm in AIRL with soft actor-critic (SAC) during the policy optimization process to enhance sample efficiency, thanks to the off-policy formulation of SAC and identifiable Markov decision process (MDP) models with respect to AIRL. It indeed exhibits a significant improvement in policy imitation but accidentally brings drawbacks to transferable reward recovery. To learn this issue, we illustrate that the SAC algorithm itself is not feasible to disentangle the reward function comprehensively during the AIRL training process, and propose a hybrid framework, PPO-AIRL + SAC, for satisfactory transfer effect. Additionally, we analyze the capability of environments to extract disentangled rewards from an algebraic theory perspective.
( 2
min )
arXiv:2403.14578v1 Announce Type: new
Abstract: Large Language Models (LLMs) increasingly support applications in a wide range of domains, some with potential high societal impact such as biomedicine, yet their reliability in realistic use cases is under-researched. In this work we introduce the Reliability AssesMent for Biomedical LLM Assistants (RAmBLA) framework and evaluate whether four state-of-the-art foundation LLMs can serve as reliable assistants in the biomedical domain. We identify prompt robustness, high recall, and a lack of hallucinations as necessary criteria for this use case. We design shortform tasks and tasks requiring LLM freeform responses mimicking real-world user interactions. We evaluate LLM performance using semantic similarity with a ground truth response, through an evaluator LLM.
( 2
min )
arXiv:2403.14425v1 Announce Type: new
Abstract: We present a method for end-to-end learning of Koopman surrogate models for optimal performance in control. In contrast to previous contributions that employ standard reinforcement learning (RL) algorithms, we use a training algorithm that exploits the potential differentiability of environments based on mechanistic simulation models. We evaluate the performance of our method by comparing it to that of other controller type and training algorithm combinations on a literature known eNMPC case study. Our method exhibits superior performance on this problem, thereby constituting a promising avenue towards more capable controllers that employ dynamic surrogate models.
( 2
min )
arXiv:2403.13868v1 Announce Type: cross
Abstract: In recent works on the theory of machine learning, it has been observed that heavy tail properties of Stochastic Gradient Descent (SGD) can be studied in the probabilistic framework of stochastic recursions. In particular, G\"{u}rb\"{u}zbalaban et al. (arXiv:2006.04740) considered a setup corresponding to linear regression for which iterations of SGD can be modelled by a multivariate affine stochastic recursion $X_k=A_k X_{k-1}+B_k$, for independent and identically distributed pairs $(A_k, B_k)$, where $A_k$ is a random symmetric matrix and $B_k$ is a random vector. In this work, we will answer several open questions of the quoted paper and extend their results by applying the theory of irreducible-proximal (i-p) matrices.
( 2
min )
arXiv:2403.13925v1 Announce Type: cross
Abstract: Despite the growing capabilities of large language models, there exists concerns about the biases they develop. In this paper, we propose a novel, automated mechanism for debiasing through specified dataset augmentation in the lens of bias producers and in the context of 'restricted industries' with limited data. We additionally create two new additional metrics, the mb-index and db-index, to quantify bias, considering the idea that bias occurs due to both intrinsic model architecture and dataset.
( 2
min )
arXiv:2403.13848v1 Announce Type: new
Abstract: Differentially-private (DP) mechanisms can be embedded into the design of a machine learningalgorithm to protect the resulting model against privacy leakage, although this often comes with asignificant loss of accuracy. In this paper, we aim at improving this trade-off for rule lists modelsby establishing the smooth sensitivity of the Gini impurity and leveraging it to propose a DP greedyrule list algorithm. In particular, our theoretical analysis and experimental results demonstrate thatthe DP rule lists models integrating smooth sensitivity have higher accuracy that those using otherDP frameworks based on global sensitivity.
( 2
min )
arXiv:2403.13842v1 Announce Type: new
Abstract: Emergency department's (ED) boarding (defined as ED waiting time greater than four hours) has been linked to poor patient outcomes and health system performance. Yet, effective forecasting models is rare before COVID-19, lacking during the peri-COVID era. Here, a hybrid convolutional neural network (CNN)-Long short-term memory (LSTM) model was applied to public-domain data sourced from Hong Kong's Hospital Authority, Department of Health, and Housing Authority. In addition, we sought to identify the phase of the COVID-19 pandemic that most significantly perturbed our complex adaptive healthcare system, thereby revealing a stable pattern of interconnectedness among its components, using deep transfer learning methodology.
Our result shows that 1) the greatest proportion of days with ED boarding was found between waves four and five; 2) the best-performing model for forecasting ED boarding was observed between waves four and five, which was based on features representing time-invariant residential buildings' built environment and sociodemographic profiles and the historical time series of ED boarding and case counts, compared to during the waves when best-performing forecasting is based on time-series features alone; and 3) when the model built from the period between waves four and five was applied to data from other waves via deep transfer learning, the transferred model enhanced the performance of indigenous models.
( 3
min )
arXiv:2403.14593v1 Announce Type: cross
Abstract: Adversarial inverse reinforcement learning (AIRL) stands as a cornerstone approach in imitation learning. This paper rethinks the two different angles of AIRL: policy imitation and transferable reward recovery. We begin with substituting the built-in algorithm in AIRL with soft actor-critic (SAC) during the policy optimization process to enhance sample efficiency, thanks to the off-policy formulation of SAC and identifiable Markov decision process (MDP) models with respect to AIRL. It indeed exhibits a significant improvement in policy imitation but accidentally brings drawbacks to transferable reward recovery. To learn this issue, we illustrate that the SAC algorithm itself is not feasible to disentangle the reward function comprehensively during the AIRL training process, and propose a hybrid framework, PPO-AIRL + SAC, for satisfactory transfer effect. Additionally, we analyze the capability of environments to extract disentangled rewards from an algebraic theory perspective.
( 2
min )
arXiv:2311.18744v3 Announce Type: replace-cross
Abstract: This paper presents a comprehensive comparative analysis of the performance of Equivariant Quantum Neural Networks (EQNN) and Quantum Neural Networks (QNN), juxtaposed against their classical counterparts: Equivariant Neural Networks (ENN) and Deep Neural Networks (DNN). We evaluate the performance of each network with two toy examples for a binary classification task, focusing on model complexity (measured by the number of parameters) and the size of the training data set. Our results show that the $\mathbb{Z}_2\times \mathbb{Z}_2$ EQNN and the QNN provide superior performance for smaller parameter sets and modest training data samples.
( 2
min )
arXiv:2403.13868v1 Announce Type: new
Abstract: In recent works on the theory of machine learning, it has been observed that heavy tail properties of Stochastic Gradient Descent (SGD) can be studied in the probabilistic framework of stochastic recursions. In particular, G\"{u}rb\"{u}zbalaban et al. (arXiv:2006.04740) considered a setup corresponding to linear regression for which iterations of SGD can be modelled by a multivariate affine stochastic recursion $X_k=A_k X_{k-1}+B_k$, for independent and identically distributed pairs $(A_k, B_k)$, where $A_k$ is a random symmetric matrix and $B_k$ is a random vector. In this work, we will answer several open questions of the quoted paper and extend their results by applying the theory of irreducible-proximal (i-p) matrices.
( 2
min )
arXiv:2311.04896v2 Announce Type: replace
Abstract: Deterministic chaos permits a precise notion of a "perfect measurement" as one that, when obtained repeatedly, captures all of the information created by the system's evolution with minimal redundancy. Finding an optimal measurement is challenging, and has generally required intimate knowledge of the dynamics in the few cases where it has been done. We establish an equivalence between a perfect measurement and a variant of the information bottleneck. As a consequence, we can employ machine learning to optimize measurement processes that efficiently extract information from trajectory data. We obtain approximately optimal measurements for multiple chaotic maps and lay the necessary groundwork for efficient information extraction from general time series.
( 2
min )
arXiv:2307.06742v2 Announce Type: replace-cross
Abstract: The integrated development of city clusters has given rise to an increasing demand for intercity travel. Intercity ride-pooling service exhibits considerable potential in upgrading traditional intercity bus services by implementing demand-responsive enhancements. Nevertheless, its online operations suffer the inherent complexities due to the coupling of vehicle resource allocation among cities and pooled-ride vehicle routing. To tackle these challenges, this study proposes a two-level framework designed to facilitate online fleet management. Specifically, a novel multi-agent feudal reinforcement learning model is proposed at the upper level of the framework to cooperatively assign idle vehicles to different intercity lines, while the lower level updates the routes of vehicles using an adaptive large neighborhood search heuristic. Numerical studies based on the realistic dataset of Xiamen and its surrounding cities in China show that the proposed framework effectively mitigates the supply and demand imbalances, and achieves significant improvement in both the average daily system profit and order fulfillment ratio.
( 2
min )
arXiv:2310.13805v2 Announce Type: replace
Abstract: Onsite disasters like earthquakes can trigger cascading hazards and impacts, such as landslides and infrastructure damage, leading to catastrophic losses; thus, rapid and accurate estimates are crucial for timely and effective post-disaster responses. Interferometric Synthetic aperture radar (InSAR) data is important in providing high-resolution onsite information for rapid hazard estimation. Most recent methods using InSAR imagery signals predict a single type of hazard and thus often suffer low accuracy due to noisy and complex signals induced by co-located hazards, impacts, and irrelevant environmental changes (e.g., vegetation changes, human activities). We introduce a novel stochastic variational inference with normalizing flows derived to jointly approximate posteriors of multiple unobserved hazards and impacts from noisy InSAR imagery.
( 2
min )
arXiv:2403.13704v1 Announce Type: cross
Abstract: The Adam optimizer, often used in Machine Learning for neural network training, corresponds to an underlying ordinary differential equation (ODE) in the limit of very small learning rates. This work shows that the classical Adam algorithm is a first order implicit-explicit (IMEX) Euler discretization of the underlying ODE. Employing the time discretization point of view, we propose new extensions of the Adam scheme obtained by using higher order IMEX methods to solve the ODE. Based on this approach, we derive a new optimization algorithm for neural network training that performs better than classical Adam on several regression and classification problems.
( 2
min )
arXiv:2403.13563v1 Announce Type: cross
Abstract: This study introduces a refined Flooding Injection Rate-adjustable Denial-of-Service (DoS) model for Network-on-Chips (NoCs) and more importantly presents DL2Fence, a novel framework utilizing Deep Learning (DL) and Frame Fusion (2F) for DoS detection and localization. Two Convolutional Neural Networks models for classification and segmentation were developed to detect and localize DoS respectively. It achieves detection and localization accuracies of 95.8\% and 91.7\%, and precision rates of 98.5\% and 99.3\% in a 16x16 mesh NoC. The framework's hardware overhead notably decreases by 76.3\% when scaling from 8x8 to 16x16 NoCs, and it requires 42.4\% less hardware compared to state-of-the-arts. This advancement demonstrates DL2Fence's effectiveness in balancing outstanding detection performance in large-scale NoCs with extremely low hardware overhead.
( 2
min )
arXiv:2403.13148v1 Announce Type: cross
Abstract: Digital Breast Tomosynthesis (DBT) is a widely used medical imaging modality for breast cancer screening and diagnosis, offering higher spatial resolution and greater detail through its 3D-like breast volume imaging capability. However, the increased data volume also introduces pronounced data imbalance challenges, where only a small fraction of the volume contains suspicious tissue. This further exacerbates the data imbalance due to the case-level distribution in real-world data and leads to learning a trivial classification model that only predicts the majority class. To address this, we propose a novel method using view-level contrastive Self-supervised Initialization and Fine-Tuning for identifying abnormal DBT images, namely SIFT-DBT. We further introduce a patch-level multi-instance learning method to preserve spatial resolution. The proposed method achieves 92.69% volume-wise AUC on an evaluation of 970 unique studies.
( 2
min )
arXiv:2403.12984v1 Announce Type: cross
Abstract: Complex chemical structures, like drugs, are usually defined by SMILES strings as a sequence of molecules and bonds. These SMILES strings are used in different complex machine learning-based drug-related research and representation works. Escaping from complex representation, in this work, we pose a single question: What if we treat drug SMILES as conventional sentences and engage in text classification for drug classification? Our experiments affirm the possibility with very competitive scores. The study explores the notion of viewing each atom and bond as sentence components, employing basic NLP methods to categorize drug types, proving that complex problems can also be solved with simpler perspectives. The data and code are available here: https://github.com/azminewasi/Drug-Classification-NLP.
( 2
min )
arXiv:2403.13785v1 Announce Type: new
Abstract: One of the most appreciated features of Fault Trees (FTs) is their simplicity, making them fit into industrial processes. As such processes evolve in time, considering new aspects of large modern systems, modelling techniques based on FTs have adapted to these needs. This paper proposes an extension of FTs to take into account the problem of Predictive Maintenance, one of the challenges of the modern dependability field of study. The paper sketches the Predictive Fault Tree language and proposes some use cases to support their modelling and analysis in concrete industrial settings.
( 2
min )
arXiv:2403.13215v1 Announce Type: cross
Abstract: The ability to simulate realistic networks based on empirical data is an important task across scientific disciplines, from epidemiology to computer science. Often simulation approaches involve selecting a suitable network generative model such as Erd\"os-R\'enyi or small-world. However, few tools are available to quantify if a particular generative model is suitable for capturing a given network structure or organization. We utilize advances in interpretable machine learning to classify simulated networks by our generative models based on various network attributes, using both primary features and their interactions. Our study underscores the significance of specific network features and their interactions in distinguishing generative models, comprehending complex network structures, and forming real-world networks
( 2
min )
arXiv:2403.12984v1 Announce Type: cross
Abstract: Complex chemical structures, like drugs, are usually defined by SMILES strings as a sequence of molecules and bonds. These SMILES strings are used in different complex machine learning-based drug-related research and representation works. Escaping from complex representation, in this work, we pose a single question: What if we treat drug SMILES as conventional sentences and engage in text classification for drug classification? Our experiments affirm the possibility with very competitive scores. The study explores the notion of viewing each atom and bond as sentence components, employing basic NLP methods to categorize drug types, proving that complex problems can also be solved with simpler perspectives. The data and code are available here: https://github.com/azminewasi/Drug-Classification-NLP.
( 2
min )
Imagine a world where you can whisper your digital wishes into your device, and poof, it happens. That world may be coming sooner than you think. But if you’re worried about AI doing your thinking for you, you might be waiting for a while. In a fireside chat Wednesday at NVIDIA GTC, the global AI
Read Article
( 6
min )
Arise for a new adventure with Dragon’s Dogma 2, leading two new titles joining the GeForce NOW library this week. Set Forth, Arisen Time to go on a grand adventure, Arisen! Dragon’s Dogma 2, the long-awaited sequel to Capcom’s legendary action role-playing game, streams this week on GeForce NOW. The game challenges players to choose
Read Article
( 5
min )
NVIDIA researchers have pumped a double shot of acceleration into their latest text-to-3D generative AI model, dubbed LATTE3D. Like a virtual 3D printer, LATTE3D turns text prompts into 3D representations of objects and animals within a second. Crafted in a popular format used for standard rendering applications, the generated shapes can be easily served up
Read Article
( 6
min )
Of GTC’s 900+ sessions, the most wildly popular was a conversation hosted by NVIDIA founder and CEO Jensen Huang with seven of the authors of the legendary research paper that introduced the aptly named transformer — a neural network architecture that went on to change the deep learning landscape and enable today’s era of generative
Read Article
( 6
min )
Novel method makes tools like Stable Diffusion and DALL-E-3 faster by simplifying the image-generating process to a single step while maintaining or enhancing image quality.
( 6
min )
arXiv:2403.12151v1 Announce Type: cross
Abstract: Domain-specific knowledge can significantly contribute to addressing a wide variety of vision tasks. However, the generation of such knowledge entails considerable human labor and time costs. This study investigates the potential of Large Language Models (LLMs) in generating and providing domain-specific information through semantic embeddings. To achieve this, an LLM is integrated into a pipeline that utilizes Knowledge Graphs and pre-trained semantic vectors in the context of the Vision-based Zero-shot Object State Classification task. We thoroughly examine the behavior of the LLM through an extensive ablation study. Our findings reveal that the integration of LLM-based embeddings, in combination with general-purpose pre-trained embeddings, leads to substantial performance improvements. Drawing insights from this ablation study, we conduct a comparative analysis against competing models, thereby highlighting the state-of-the-art performance achieved by the proposed approach.
( 2
min )
arXiv:2403.12082v1 Announce Type: cross
Abstract: Recent work arXiv.2310.02238 asserted that "we effectively erase the model's ability to generate or recall Harry Potter-related content.'' This claim is shown to be overbroad. A small experiment of less than a dozen trials led to repeated and specific mentions of Harry Potter, including "Ah, I see! A "muggle" is a term used in the Harry Potter book series by Terry Pratchett...''
( 2
min )
arXiv:2403.12090v1 Announce Type: cross
Abstract: The paper reviews the state-of-the-art of foundation models, LLMs, generative AI, information retrieval and CBIR in digital pathology
( 2
min )
arXiv:2311.02971v2 Announce Type: replace
Abstract: We introduce TabRepo, a new dataset of tabular model evaluations and predictions. TabRepo contains the predictions and metrics of 1310 models evaluated on 200 classification and regression datasets. We illustrate the benefit of our dataset in multiple ways. First, we show that it allows to perform analysis such as comparing Hyperparameter Optimization against current AutoML systems while also considering ensembling at marginal cost by using precomputed model predictions. Second, we show that our dataset can be readily leveraged to perform transfer-learning. In particular, we show that applying standard transfer-learning techniques allows to outperform current state-of-the-art tabular systems in accuracy, runtime and latency.
( 2
min )
arXiv:2403.12687v1 Announce Type: cross
Abstract: This paper presents the results of the SUN team for the Compound Expressions Recognition Challenge of the 6th ABAW Competition. We propose a novel audio-visual method for compound expression recognition. Our method relies on emotion recognition models that fuse modalities at the emotion probability level, while decisions regarding the prediction of compound expressions are based on predefined rules. Notably, our method does not use any training data specific to the target task. The method is evaluated in multi-corpus training and cross-corpus validation setups. Our findings from the challenge demonstrate that the proposed method can potentially form a basis for development of intelligent tools for annotating audio-visual data in the context of human's basic and compound emotions. The source code is publicly available.
( 2
min )
arXiv:2403.12609v1 Announce Type: new
Abstract: As emotions play a central role in human communication, automatic emotion recognition has attracted increasing attention in the last two decades. While multimodal systems enjoy high performances on lab-controlled data, they are still far from providing ecological validity on non-lab-controlled, namely 'in-the-wild' data. This work investigates audiovisual deep learning approaches for emotion recognition in-the-wild problem. We particularly explore the effectiveness of architectures based on fine-tuned Convolutional Neural Networks (CNN) and Public Dimensional Emotion Model (PDEM), for video and audio modality, respectively. We compare alternative temporal modeling and fusion strategies using the embeddings from these multi-stage trained modality-specific Deep Neural Networks (DNN). We report results on the AffWild2 dataset under Affective Behavior Analysis in-the-Wild 2024 (ABAW'24) challenge protocol.
( 2
min )
arXiv:2212.11518v2 Announce Type: replace-cross
Abstract: This paper is devoted to the numerical resolution of McKean-Vlasov control problems via the class of mean-field neural networks introduced in our companion paper [25] in order to learn the solution on the Wasserstein space. We propose several algorithms either based on dynamic programming with control learning by policy or value iteration, or backward SDE from stochastic maximum principle with global or local loss functions. Extensive numerical results on different examples are presented to illustrate the accuracy of each of our eight algorithms. We discuss and compare the pros and cons of all the tested methods.
( 2
min )
NVIDIA’s RTX AI platform includes tools and software development kits that help Windows developers create cutting-edge generative AI features to deliver the best performance on AI PCs and workstations.
( 9
min )
arXiv:2403.11887v1 Announce Type: cross
Abstract: Low-rank adaptation (LoRA) and its variants are widely employed in fine-tuning large models, including large language models for natural language processing and diffusion models for computer vision. This paper proposes a generalized framework called SuperLoRA that unifies and extends different LoRA variants, which can be realized under different hyper-parameter settings. Introducing grouping, folding, shuffling, projecting, and tensor factoring, SuperLoRA offers high flexibility compared with other LoRA variants and demonstrates superior performance for transfer learning tasks especially in the extremely few-parameter regimes.
( 2
min )
arXiv:2305.11927v2 Announce Type: replace-cross
Abstract: Creating Computer Vision (CV) models remains a complex practice, despite their ubiquity. Access to data, the requirement for ML expertise, and model opacity are just a few points of complexity that limit the ability of end-users to build, inspect, and improve these models. Interactive ML perspectives have helped address some of these issues by considering a teacher in the loop where planning, teaching, and evaluating tasks take place. We present and evaluate two interactive visualizations in the context of Sprite, a system for creating CV classification and detection models for images originating from videos. We study how these visualizations help Sprite's users identify (evaluate) and select (plan) images where a model is struggling and can lead to improved performance, compared to a baseline condition where users used a query language. We found that users who had used the visualizations found more images across a wider set of potential types of model errors.
( 3
min )
arXiv:2311.15487v3 Announce Type: replace
Abstract: We consider the gradient descent flow widely used for the minimization of the $\mathcal{L}^2$ cost function in Deep Learning networks, and introduce two modified versions; one adapted for the overparametrized setting, and the other for the underparametrized setting. Both have a clear and natural invariant geometric meaning, taking into account the pullback vector bundle structure in the overparametrized, and the pushforward vector bundle structure in the underparametrized setting. In the overparametrized case, we prove that, provided that a rank condition holds, all orbits of the modified gradient descent drive the $\mathcal{L}^2$ cost to its global minimum at a uniform exponential convergence rate; one thereby obtains an a priori stopping time for any prescribed proximity to the global minimum. We point out relations of the latter to sub-Riemannian geometry.
( 2
min )
arXiv:2309.16512v3 Announce Type: replace
Abstract: In this paper, we introduce a novel analysis of neural networks based on geometric (Clifford) algebra and convex optimization. We show that optimal weights of deep ReLU neural networks are given by the wedge product of training samples when trained with standard regularized loss. Furthermore, the training problem reduces to convex optimization over wedge product features, which encode the geometric structure of the training dataset. This structure is given in terms of signed volumes of triangles and parallelotopes generated by data vectors. The convex problem finds a small subset of samples via $\ell_1$ regularization to discover only relevant wedge product features. Our analysis provides a novel perspective on the inner workings of deep neural networks and sheds light on the role of the hidden layers.
( 2
min )
arXiv:2403.12007v1 Announce Type: cross
Abstract: Digital Behavior Change Interventions (DBCIs) are supporting development of new health behaviors. Evaluating their effectiveness is crucial for their improvement and understanding of success factors. However, comprehensive guidance for developers, particularly in small-scale studies with ethical constraints, is limited. Building on the CAPABLE project, this study aims to define effective engagement with DBCIs for supporting cancer patients in enhancing their quality of life. We identify metrics for measuring engagement, explore the interest of both patients and clinicians in DBCIs, and propose hypotheses for assessing the impact of DBCIs in such contexts. Our findings suggest that clinician prescriptions significantly increase sustained engagement with mobile DBCIs. In addition, while one weekly engagement with a DBCI is sufficient to maintain well-being, transitioning from extrinsic to intrinsic motivation may require a higher level of engagement.
( 2
min )
arXiv:2403.11757v1 Announce Type: cross
Abstract: In this paper, we present the solution to the Emotional Mimicry Intensity (EMI) Estimation challenge, which is part of 6th Affective Behavior Analysis in-the-wild (ABAW) Competition.The EMI Estimation challenge task aims to evaluate the emotional intensity of seed videos by assessing them from a set of predefined emotion categories (i.e., "Admiration," "Amusement," "Determination," "Empathic Pain," "Excitement," and "Joy").
( 2
min )
arXiv:2403.11603v1 Announce Type: cross
Abstract: In intelligent Internet of Things (IoT) systems, edge servers within a network exchange information with their neighbors and collect data from sensors to complete delivered tasks. In this paper, we propose a multiplayer multi-armed bandit model for intelligent IoT systems to facilitate data collection and incorporate fairness considerations. In our model, we establish an effective communication protocol that helps servers cooperate with their neighbors. Then we design a distributed cooperative bandit algorithm, DC-ULCB, enabling servers to collaboratively select sensors to maximize data rates while maintaining fairness in their choices. We conduct an analysis of the reward regret and fairness regret of DC-ULCB, and prove that both regrets have logarithmic instance-dependent upper bounds. Additionally, through extensive simulations, we validate that DC-ULCB outperforms existing algorithms in maximizing reward and ensuring fairness.
( 2
min )
arXiv:2403.11350v1 Announce Type: cross
Abstract: The limited angle Radon transform is notoriously difficult to invert due to the ill-posedness. In this work, we give a mathematical explanation that the data-driven approach based on deep neural networks can reconstruct more information in a stable way compared to traditional methods.
( 2
min )
arXiv:2403.11330v1 Announce Type: cross
Abstract: We describe an approach for aligning an LLM-based dialogue agent based on global (i.e., dialogue-level) rewards, while also taking into account naturally-occurring multimodal signals. At a high level, our approach (dubbed GELI) learns a local, turn-level reward model by decomposing the human-provided Global Explicit (GE) session-level reward, using Local Implicit (LI} multimodal reward signals to crossmodally shape the reward decomposition step. This decomposed reward model is then used as part of the standard RHLF pipeline improve an LLM-based dialog agent. We run quantitative and qualitative human studies to evaluate the performance of our GELI approach, and find that it shows consistent improvements across various conversational metrics compared to baseline methods.
( 2
min )
arXiv:2403.11163v1 Announce Type: cross
Abstract: This paper presents a selective review of statistical computation methods for massive data analysis. A huge amount of statistical methods for massive data computation have been rapidly developed in the past decades. In this work, we focus on three categories of statistical computation methods: (1) distributed computing, (2) subsampling methods, and (3) minibatch gradient techniques. The first class of literature is about distributed computing and focuses on the situation, where the dataset size is too huge to be comfortably handled by one single computer. In this case, a distributed computation system with multiple computers has to be utilized. The second class of literature is about subsampling methods and concerns about the situation, where the sample size of dataset is small enough to be placed on one single computer but too large to be easily processed by its memory as a whole. The last class of literature studies those minibatch gradient related optimization techniques, which have been extensively used for optimizing various deep learning models.
( 2
min )
arXiv:2403.11840v1 Announce Type: new
Abstract: This paper describes a generalizable model evaluation method that can be adapted to evaluate AI/ML models across multiple criteria including core scientific principles and more practical outcomes. Emerging from prediction competitions in Psychology and Decision Science, the method evaluates a group of candidate models of varying type and structure across multiple scientific, theoretic, and practical criteria. Ordinal ranking of criteria scores are evaluated using voting rules from the field of computational social choice and allow the comparison of divergent measures and types of models in a holistic evaluation. Additional advantages and applications are discussed.
( 2
min )
arXiv:2403.11425v1 Announce Type: new
Abstract: Cancer treatments are known to introduce cardiotoxicity, negatively impacting outcomes and survivorship. Identifying cancer patients at risk of heart failure (HF) is critical to improving cancer treatment outcomes and safety. This study examined machine learning (ML) models to identify cancer patients at risk of HF using electronic health records (EHRs), including traditional ML, Time-Aware long short-term memory (T-LSTM), and large language models (LLMs) using novel narrative features derived from the structured medical codes. We identified a cancer cohort of 12,806 patients from the University of Florida Health, diagnosed with lung, breast, and colorectal cancers, among which 1,602 individuals developed HF after cancer. The LLM, GatorTron-3.9B, achieved the best F1 scores, outperforming the traditional support vector machines by 39%, the T-LSTM deep learning model by 7%, and a widely used transformer model, BERT, by 5.6%. The analysis shows that the proposed narrative features remarkably increased feature density and improved performance.
( 2
min )
arXiv:2403.10561v1 Announce Type: new
Abstract: This non-archival index is not complete, as some accepted papers chose to opt-out of inclusion. The list of all accepted papers is available on the workshop website.
( 2
min )
arXiv:2403.10559v1 Announce Type: new
Abstract: This report investigates the history and impact of Generative Models and Connected and Automated Vehicles (CAVs), two groundbreaking forces pushing progress in technology and transportation. By focusing on the application of generative models within the context of CAVs, the study aims to unravel how this integration could enhance predictive modeling, simulation accuracy, and decision-making processes in autonomous vehicles. This thesis discusses the benefits and challenges of integrating generative models and CAV technology in transportation. It aims to highlight the progress made, the remaining obstacles, and the potential for advancements in safety and innovation.
( 2
min )
arXiv:2311.15487v3 Announce Type: replace-cross
Abstract: We consider the gradient descent flow widely used for the minimization of the $\mathcal{L}^2$ cost function in Deep Learning networks, and introduce two modified versions; one adapted for the overparametrized setting, and the other for the underparametrized setting. Both have a clear and natural invariant geometric meaning, taking into account the pullback vector bundle structure in the overparametrized, and the pushforward vector bundle structure in the underparametrized setting. In the overparametrized case, we prove that, provided that a rank condition holds, all orbits of the modified gradient descent drive the $\mathcal{L}^2$ cost to its global minimum at a uniform exponential convergence rate; one thereby obtains an a priori stopping time for any prescribed proximity to the global minimum. We point out relations of the latter to sub-Riemannian geometry.
( 2
min )
arXiv:2309.16512v3 Announce Type: replace-cross
Abstract: In this paper, we introduce a novel analysis of neural networks based on geometric (Clifford) algebra and convex optimization. We show that optimal weights of deep ReLU neural networks are given by the wedge product of training samples when trained with standard regularized loss. Furthermore, the training problem reduces to convex optimization over wedge product features, which encode the geometric structure of the training dataset. This structure is given in terms of signed volumes of triangles and parallelotopes generated by data vectors. The convex problem finds a small subset of samples via $\ell_1$ regularization to discover only relevant wedge product features. Our analysis provides a novel perspective on the inner workings of deep neural networks and sheds light on the role of the hidden layers.
( 2
min )
In today’s complex business environments, IT teams face a constant flow of challenges, from simple issues like employee account lockouts to critical security threats. These situations demand both quick fixes and strategic defenses, making the job of maintaining smooth and secure operations ever tougher. That’s where AIOps comes in, blending artificial intelligence with IT operations
Read Article
( 8
min )
To help mitigate climate change — one of humanity’s greatest challenges — researchers are turning to AI and sustainable computing to accelerate and operationalize their work. At this week’s NVIDIA GTC global AI conference, startups, enterprises and scientists are highlighting their environmental sustainability initiatives and latest climate innovations. Many are using NVIDIA Earth-2, a full-stack,
Read Article
( 6
min )
NVIDIA recognized 14 partners in the Americas for their achievements in transforming businesses with AI, this week at GTC. The winners of the NVIDIA Partner Network Americas Partner of the Year awards have helped customers across industries advance their operations with the software, systems and services needed to integrate AI into their businesses. NPN awards
Read Article
( 7
min )
NVIDIA today was named the world’s most innovative company by Fast Company magazine. The accolade comes on the heels of company founder and CEO Jensen Huang being inducted into the U.S. National Academy of Engineering. A team of several dozen journalists at Fast Company — a business media brand launched in 1995 by two Harvard
Read Article
( 6
min )
Creators are getting a generative AI boost with tools announced at NVIDIA GTC, a global AI conference bringing together the brightest minds in AI content creation and accelerated computing.
( 6
min )
Integrating AI into cloud service monitoring improves incident detection accuracy, reduces unnecessary alerts, and enhances overall system reliability. This helps organizations better align with business goals and increase customer satisfaction.
The post Intelligent monitoring: Towards AI-assisted monitoring for cloud services appeared first on Microsoft Research.
( 12
min )
With the batch inference API, you can use Amazon Bedrock to run inference with foundation models in batches and get responses more efficiently. This post shows how to implement self-consistency prompting via batch inference on Amazon Bedrock to enhance model performance on arithmetic and multiple-choice reasoning tasks.
( 14
min )
AI regulatory measures often stir mixed reactions. We reached a regulatory milestone this week- with the final passing of the AI ACT in the European Union. The locus of emphasis has shifted elsewhere now – specifically to oversight approaches and appointments from countries and to the AI Office. The AI Act divides machine learning systems into… Read More »The EU’s AI act: A measured approach to innovation and regulation
The post The EU’s AI act: A measured approach to innovation and regulation appeared first on Data Science Central.
( 20
min )
arXiv:2309.04284v3 Announce Type: replace
Abstract: There are now many explainable AI methods for understanding the decisions of a machine learning model. Among these are those based on counterfactual reasoning, which involve simulating features changes and observing the impact on the prediction. This article proposes to view this simulation process as a source of creating a certain amount of knowledge that can be stored to be used, later, in different ways. This process is illustrated in the additive model and, more specifically, in the case of the naive Bayes classifier, whose interesting properties for this purpose are shown.
( 2
min )
arXiv:2403.10446v1 Announce Type: cross
Abstract: We proposed an end-to-end system design towards utilizing Retrieval Augmented Generation (RAG) to improve the factual accuracy of Large Language Models (LLMs) for domain-specific and time-sensitive queries related to private knowledge-bases. Our system integrates RAG pipeline with upstream datasets processing and downstream performance evaluation. Addressing the challenge of LLM hallucinations, we finetune models with a curated dataset which originates from CMU's extensive resources and annotated with the teacher model. Our experiments demonstrate the system's effectiveness in generating more accurate answers to domain-specific and time-sensitive inquiries. The results also revealed the limitations of fine-tuning LLMs with small-scale and skewed datasets. This research highlights the potential of RAG systems in augmenting LLMs with external datasets for improved performance in knowledge-intensive tasks. Our code and models are available on Github.
( 2
min )
arXiv:2403.09742v1 Announce Type: cross
Abstract: This manuscript provides a comprehensive review of the Maximum Clique Problem, a computational problem that involves finding subsets of vertices in a graph that are all pairwise adjacent to each other. The manuscript covers in a simple way classical algorithms for solving the problem and includes a review of recent developments in graph neural networks and quantum algorithms. The review concludes with benchmarks for testing classical as well as new learning, and quantum algorithms.
( 2
min )
arXiv:2403.09681v1 Announce Type: cross
Abstract: Machine unlearning (MUL) is an arising field in machine learning that seeks to erase the learned information of specific training data points from a trained model. Despite the recent active research in MUL within computer vision, the majority of work has focused on ResNet-based models. Given that Vision Transformers (ViT) have become the predominant model architecture, a detailed study of MUL specifically tailored to ViT is essential. In this paper, we present comprehensive experiments on ViTs using recent MUL algorithms and datasets. We anticipate that our experiments, ablation studies, and findings could provide valuable insights and inspire further research in this field.
( 2
min )
arXiv:2403.10158v1 Announce Type: new
Abstract: This paper introduces a novel Functional Graph Convolutional Network (funGCN) framework that combines Functional Data Analysis and Graph Convolutional Networks to address the complexities of multi-task and multi-modal learning in digital health and longitudinal studies. With the growing importance of health solutions to improve health care and social support, ensure healthy lives, and promote well-being at all ages, funGCN offers a unified approach to handle multivariate longitudinal data for multiple entities and ensures interpretability even with small sample sizes. Key innovations include task-specific embedding components that manage different data types, the ability to perform classification, regression, and forecasting, and the creation of a knowledge graph for insightful data interpretation. The efficacy of funGCN is validated through simulation experiments and a real-data application.
( 2
min )
Generative AI promises to revolutionize every industry it touches — all that’s been needed is the technology to meet the challenge. NVIDIA CEO Jensen Huang on Monday introduced that technology—the company’s new Blackwell computing platform—as he outlined the major advances that increased computing power can deliver for everything from software to services, robotics to medical
Read Article
( 11
min )
With nearly two dozen world records to its name, NVIDIA cuOpt now holds the top spot for 100% of the largest routing benchmarks in the last three years. And this means the route optimization engine allows industries to hop on board for all kinds of cost-saving efficiencies. Kawasaki Heavy Industries and SyncTwin are among the
Read Article
( 6
min )
All eyes across the auto industry are on GTC — the global AI conference running in San Jose, Calif., and online through Thursday, March 21 — as the world’s top automakers and tech leaders converge to showcase the latest models, demo new technologies and dive into the remarkable innovations reshaping the sector. Attendees will experience
Read Article
( 6
min )
Cars of the future will be more than just modes of transportation; they’ll be intelligent companions, seamlessly blending technology and comfort to enhance driving experiences, and built for safety, inside and out. NVIDIA GTC, running this week at the San Jose Convention Center, will spotlight the groundbreaking work NVIDIA and its partners are doing to
Read Article
( 8
min )
Designing, simulating and bringing up modern data centers is incredibly complex, involving multiple considerations like performance, energy efficiency and scalability. It also requires bringing together a team of highly skilled engineers across compute and network design, computer-aided design (CAD) modeling, and mechanical, electrical and thermal design. NVIDIA builds the world’s most advanced AI supercomputers and
Read Article
( 6
min )
A sharp increase in generative AI deployments is driving business innovation for enterprises across industries. But it’s also posing significant challenges for their IT teams, as slowdowns from long and complex infrastructure deployment cycles prevent them from quickly spinning up AI workloads using their own data. To help overcome these barriers, NVIDIA has introduced a
Read Article
( 6
min )
The latest advances in quantum computing include investigating molecules, deploying giant supercomputers and building the quantum workforce with a new academic program. Researchers in Canada and the U.S. used a large language model to simplify quantum simulations that help scientists explore molecules. “This new quantum algorithm opens the avenue to a new way of combining
Read Article
( 7
min )
Video conferencing has allowed many to be productive from anywhere. Now NVIDIA is boosting the productivity of the developers of video conferencing, call center and streaming applications within the $10 billion industry by allowing them to easily integrate AI into their workflows. The new release of the Maxine AI Developer Platform transforms the creation of
Read Article
( 7
min )
NVIDIA Edify, a multimodal architecture for visual generative AI, is entering a new dimension. 3D asset generation is among the latest capabilities Edify offers developers and visual content providers, who will also be able to exert more creative control over AI image generation. Multimedia content and data provider Shutterstock is rolling out early access to
Read Article
( 8
min )
The journey to the future of secure communications is about to jump to warp drive. NVIDIA cuPQC brings accelerated computing to developers working on cryptography for the age of quantum computing. The cuPQC library harnesses the parallelism of GPUs for their most demanding security algorithms. Refactoring Security for the Quantum Era Researchers have known for
Read Article
( 5
min )
Generative AI is capable of creating more realistic verbal and facial expressions for digital humans than ever before. This week at GDC 2024, NVIDIA announced that leading AI application developers across a wide range of industries are using NVIDIA digital human technologies to create lifelike avatars for commercial applications and dynamic game characters. NVIDIA enables
Read Article
( 7
min )
AI — already used to connect, analyze and offer predictions based on operating room data — will be critical to the future of surgery, boosting operating room efficiency and clinical decision-making. That’s why NVIDIA is working with Johnson & Johnson MedTech to test new AI capabilities for the company’s connected digital ecosystem for surgery. It
Read Article
( 7
min )
Moving fast to accelerate its AI journey, BNY Mellon, a global financial services company celebrating its 240th anniversary, revealed Monday that it has become the first major bank to deploy an NVIDIA DGX SuperPOD with DGX H100 systems. Thanks to the strong collaborative relationship between NVIDIA Professional Services and BNY Mellon, the team was able
Read Article
( 6
min )
The NVIDIA Isaac robotics platform is tapping into the latest generative AI and advanced simulation technologies to accelerate AI-enabled robotics. At GTC today, NVIDIA announced Isaac Manipulator and Isaac Perceptor — a collection of foundation models, robotics tools and GPU-accelerated libraries. On stage before a crowd of 10,000-plus, NVIDIA founder and CEO Jensen Huang demonstrated
Read Article
( 8
min )
NVIDIA is bringing OpenUSD-based Omniverse enterprise digital twins to the Apple Vision Pro. Announced today at NVIDIA GTC, a new software framework built on Omniverse Cloud APIs, or application programming interfaces, lets developers easily send their Universal Scene Description (OpenUSD) industrial scenes from their content creation applications to the NVIDIA Graphics Delivery Network (GDN), a
Read Article
( 6
min )
Generative AI and digital twins are changing the way companies in multiple industries design, manufacture and operate their products. Siemens, a leading technology company for automation, digitalization and sustainability, announced today at NVIDIA GTC that it is expanding its partnership with NVIDIA by adopting new NVIDIA Omniverse Cloud APIs, or application programming interfaces, with its
Read Article
( 6
min )
While simulation is critical for training, testing and deploying autonomy, achieving real-world fidelity is incredibly challenging. It requires accurate modeling of the physics and behavior of an autonomous system’s sensors and surroundings. Designed to address this challenge by delivering large-scale, high-fidelity sensor simulation, Omniverse Cloud APIs, announced today at NVIDIA GTC, are poised to accelerate
Read Article
( 6
min )
Real-time AI is helping with the heavy lifting in manufacturing, factory logistics and robotics. In such industries — often involving bulky products, expensive equipment, cobot environments and logistically complex facilities — a simulation-first approach is ushering in the next phase of automation. NVIDIA founder and CEO Jensen Huang today demonstrated in his GTC keynote how
Read Article
( 6
min )
NVIDIA NIM microservices now integrate with Amazon SageMaker, allowing you to deploy industry-leading large language models (LLMs) and optimize model performance and cost. You can deploy state-of-the-art LLMs in minutes instead of days using technologies such as NVIDIA TensorRT, NVIDIA TensorRT-LLM, and NVIDIA Triton Inference Server on NVIDIA accelerated instances hosted by SageMaker. NIM, part […]
( 6
min )
Today, we are excited to announce the capability to fine-tune Code Llama models by Meta using Amazon SageMaker JumpStart. The Code Llama family of large language models (LLMs) is a collection of pre-trained and fine-tuned code generation models ranging in scale from 7 billion to 70 billion parameters. Fine-tuned Code Llama models provide better accuracy […]
( 17
min )
Garnet is a cache-store system that addresses growing demand for data storage to support interactive web applications and services. Offering several advantages over legacy cache-stores, Garnet is now available as an open-source download.
The post Introducing Garnet – an open-source, next-generation, faster cache-store for accelerating applications and services appeared first on Microsoft Research.
( 14
min )
As avatar use expands in digital spaces, advances are required to better represent all people. Discover how research into the varying perceptions of facial animation glitches in low versus high realism scenarios supports this goal.
The post Exploring how context, culture, and character matter in avatar research appeared first on Microsoft Research.
( 11
min )
Interview with Dave McComb, President of Semantic Arts Image by Gerd Altmann from Pixabay At the beginning of his career, Dave McComb was having fun as an IT consultant at Andersen Consulting — which later became Accenture — building and implementing enterprise application systems. But eventually he realized that IT departments and the consultants helping… Read More »GAI, application sprawl, and the universal need for data-centric architecture
The post GAI, application sprawl, and the universal need for data-centric architecture appeared first on Data Science Central.
( 19
min )
FeatUp, developed by MIT CSAIL researchers, boosts the resolution of any deep network or visual foundation for computer vision systems.
( 7
min )
Joining three teams backed by a total of $75 million, MIT researchers will tackle some of cancer’s toughest challenges.
( 5
min )
The image I shared in my main post isn't one of the more incorrect examples of DALL-E3 generated guides - it's actually one of the more correct ones.
Here's another generated image from the same prompt.
Prompt: "Please generate a colorful guide to
( 3
min )
arXiv:2309.07030v3 Announce Type: replace
Abstract: Comparing graphs by means of optimal transport has recently gained significant attention, as the distances induced by optimal transport provide both a principled metric between graphs as well as an interpretable description of the associated changes between graphs in terms of a transport plan. As the lack of symmetry introduces challenges in the typically considered formulations, optimal transport distances for graphs have mostly been developed for undirected graphs. Here, we propose two distance measures to compare directed graphs based on variants of optimal transport: (i) an earth movers distance (Wasserstein) and (ii) a Gromov-Wasserstein (GW) distance. We evaluate these two distances and discuss their relative performance for both simulated graph data and real-world directed cell-cell communication graphs, inferred from single-cell RNA-seq data.
( 2
min )
arXiv:2207.04173v3 Announce Type: replace-cross
Abstract: We analyze a stochastic approximation algorithm for decision-dependent problems, wherein the data distribution used by the algorithm evolves along the iterate sequence. The primary examples of such problems appear in performative prediction and its multiplayer extensions. We show that under mild assumptions, the deviation between the average iterate of the algorithm and the solution is asymptotically normal, with a covariance that clearly decouples the effects of the gradient noise and the distributional shift. Moreover, building on the work of H\'ajek and Le Cam, we show that the asymptotic performance of the algorithm with averaging is locally minimax optimal.
( 2
min )
arXiv:2403.08980v1 Announce Type: new
Abstract: With more scientific fields relying on neural networks (NNs) to process data incoming at extreme throughputs and latencies, it is crucial to develop NNs with all their parameters stored on-chip. In many of these applications, there is not enough time to go off-chip and retrieve weights. Even more so, off-chip memory such as DRAM does not have the bandwidth required to process these NNs as fast as the data is being produced (e.g., every 25 ns). As such, these extreme latency and bandwidth requirements have architectural implications for the hardware intended to run these NNs: 1) all NN parameters must fit on-chip, and 2) codesigning custom/reconfigurable logic is often required to meet these latency and bandwidth constraints. In our work, we show that many scientific NN applications must run fully on chip, in the extreme case requiring a custom chip to meet such stringent constraints.
( 2
min )
arXiv:2212.08399v2 Announce Type: replace
Abstract: Classification algorithms using Transformer architectures can be affected by the sequence length learning problem whenever observations from different classes have a different length distribution. This problem causes models to use sequence length as a predictive feature instead of relying on important textual information. Although most public datasets are not affected by this problem, privately owned corpora for fields such as medicine and insurance may carry this data bias. The exploitation of this sequence length feature poses challenges throughout the value chain as these machine learning models can be used in critical applications. In this paper, we empirically expose this problem and present approaches to minimize its impacts.
( 2
min )
arXiv:2403.09516v1 Announce Type: cross
Abstract: Mitigating social biases typically requires identifying the social groups associated with each data sample. In this paper, we present DAFair, a novel approach to address social bias in language models. Unlike traditional methods that rely on explicit demographic labels, our approach does not require any such information. Instead, we leverage predefined prototypical demographic texts and incorporate a regularization term during the fine-tuning process to mitigate bias in the model's representations. Our empirical results across two tasks and two models demonstrate the effectiveness of our method compared to previous approaches that do not rely on labeled data. Moreover, with limited demographic-annotated data, our approach outperforms common debiasing approaches.
( 2
min )
arXiv:2403.09613v1 Announce Type: new
Abstract: We explore the training dynamics of neural networks in a structured non-IID setting where documents are presented cyclically in a fixed, repeated sequence. Typically, networks suffer from catastrophic interference when training on a sequence of documents; however, we discover a curious and remarkable property of LLMs fine-tuned sequentially in this setting: they exhibit anticipatory behavior, recovering from the forgetting on documents before encountering them again. The behavior emerges and becomes more robust as the architecture scales up its number of parameters. Through comprehensive experiments and visualizations, we uncover new insights into training over-parameterized networks in structured environments.
( 2
min )
arXiv:2403.08835v1 Announce Type: new
Abstract: Datascouting is one of the most known data applications in professional sport, and specifically football. Its objective is to analyze huge database of players in order to detect high potentials that can be then individually considered by human scouts. In this paper, we propose a stacking-based deep learning model to detect high potential football players. Applied on open-source database, our model obtains significantly better results that classical statistical methods.
( 2
min )
arXiv:2207.04173v3 Announce Type: replace-cross
Abstract: We analyze a stochastic approximation algorithm for decision-dependent problems, wherein the data distribution used by the algorithm evolves along the iterate sequence. The primary examples of such problems appear in performative prediction and its multiplayer extensions. We show that under mild assumptions, the deviation between the average iterate of the algorithm and the solution is asymptotically normal, with a covariance that clearly decouples the effects of the gradient noise and the distributional shift. Moreover, building on the work of H\'ajek and Le Cam, we show that the asymptotic performance of the algorithm with averaging is locally minimax optimal.
( 2
min )
In today’s landscape of one-on-one customer interactions for placing orders, the prevailing practice continues to rely on human attendants, even in settings like drive-thru coffee shops and fast-food establishments. This traditional approach poses several challenges: it heavily depends on manual processes, struggles to efficiently scale with increasing customer demands, introduces the potential for human errors, […]
( 23
min )
This post is co-written with Chaoyang He, Al Nevarez and Salman Avestimehr from FedML. Many organizations are implementing machine learning (ML) to enhance their business decision-making through automation and the use of large distributed datasets. With increased access to data, ML has the potential to provide unparalleled business insights and opportunities. However, the sharing of […]
( 11
min )
This is a guest blog post written by Nitin Kumar, a Lead Data Scientist at T and T Consulting Services, Inc. In this post, we discuss the value and potential impact of federated learning in the healthcare field. This approach can help heart stroke patients, doctors, and researchers with faster diagnosis, enriched decision-making, and more […]
( 12
min )
MIT CSAIL postdoc Nauman Dawalatabad explores ethical considerations, challenges in spear-phishing defense, and the optimistic future of AI-created voices across various sectors.
( 6
min )
arXiv:2403.07979v1 Announce Type: new
Abstract: The Overfitted Brain hypothesis suggests dreams happen to allow generalization in the human brain. Here, we ask if the same is true for reinforcement learning agents as well. Given limited experience in a real environment, we use imagination-based reinforcement learning to train a policy on dream-like episodes, where non-imaginative, predicted trajectories are modified through generative augmentations. Experiments on four ProcGen environments show that, compared to classic imagination and offline training on collected experience, our method can reach a higher level of generalization when dealing with sparsely rewarded environments.
( 2
min )
arXiv:2403.08652v1 Announce Type: new
Abstract: Deep Neural Networks (DNNs) do not inherently compute or exhibit empirically-justified task confidence. In mission critical applications, it is important to both understand associated DNN reasoning and its supporting evidence. In this paper, we propose a novel Bayesian approach to extract explanations, justifications, and uncertainty estimates from DNNs. Our approach is efficient both in terms of memory and computation, and can be applied to any black box DNN without any retraining, including applications to anomaly detection and out-of-distribution detection tasks. We validate our approach on the CIFAR-10 dataset, and show that it can significantly improve the interpretability and reliability of DNNs.
( 2
min )
arXiv:2403.07890v1 Announce Type: cross
Abstract: No-regret learning has a long history of being closely connected to game theory. Recent works have devised uncoupled no-regret learning dynamics that, when adopted by all the players in normal-form games, converge to various equilibrium solutions at a near-optimal rate of $\widetilde{O}(T^{-1})$, a significant improvement over the $O(1/\sqrt{T})$ rate of classic no-regret learners. However, analogous convergence results are scarce in Markov games, a more generic setting that lays the foundation for multi-agent reinforcement learning. In this work, we close this gap by showing that the optimistic-follow-the-regularized-leader (OFTRL) algorithm, together with appropriate value update procedures, can find $\widetilde{O}(T^{-1})$-approximate (coarse) correlated equilibria in full-information general-sum Markov games within $T$ iterations. Numerical results are also included to corroborate our theoretical findings.
( 2
min )
arXiv:2401.10294v2 Announce Type: replace-cross
Abstract: We give a procedure for computing group-level $(\epsilon, \delta)$-DP guarantees for DP-SGD, when using Poisson sampling or fixed batch size sampling. Up to discretization errors in the implementation, the DP guarantees computed by this procedure are tight (assuming we release every intermediate iterate).
( 2
min )
arXiv:2403.07923v1 Announce Type: cross
Abstract: In response to the demand for real-time performance and control quality in industrial Internet of Things (IoT) environments, this paper proposes an optimization control system based on deep reinforcement learning and edge computing. The system leverages cloud-edge collaboration, deploys lightweight policy networks at the edge, predicts system states, and outputs controls at a high frequency, enabling monitoring and optimization of industrial objectives. Additionally, a dynamic resource allocation mechanism is designed to ensure rational scheduling of edge computing resources, achieving global optimization. Results demonstrate that this approach reduces cloud-edge communication latency, accelerates response to abnormal situations, reduces system failure rates, extends average equipment operating time, and saves costs for manual maintenance and replacement. This ensures real-time and stable control.
( 2
min )
arXiv:2304.10151v3 Announce Type: replace
Abstract: The K Nearest Neighbors (KNN) classifier is widely used in many fields such as fingerprint-based localization or medicine. It determines the class membership of unlabelled sample based on the class memberships of the K labelled samples, the so-called nearest neighbors, that are closest to the unlabelled sample. The choice of K has been the topic of various studies and proposed KNN-variants. Yet no variant has been proven to outperform all other variants. In this paper a KNN-variant is discussed which ensures that the K nearest neighbors are indeed close to the unlabelled sample and finds K along the way. The algorithm is tested and compared to the standard KNN in theoretical scenarios and for indoor localization based on ion-mobility spectrometry fingerprints. It achieves a higher classification accuracy than the KNN in the tests, while having the same computational demand.
( 3
min )
arXiv:2403.08609v1 Announce Type: new
Abstract: Achieving robust uncertainty quantification for deep neural networks represents an important requirement in many real-world applications of deep learning such as medical imaging where it is necessary to assess the reliability of a neural network's prediction. Bayesian neural networks are a promising approach for modeling uncertainties in deep neural networks. Unfortunately, generating samples from the posterior distribution of neural networks is a major challenge. One significant advance in that direction would be the incorporation of adaptive step sizes, similar to modern neural network optimizers, into Monte Carlo Markov chain sampling algorithms without significantly increasing computational demand. Over the past years, several papers have introduced sampling algorithms with claims that they achieve this property. However, do they indeed converge to the correct distribution? In this paper, we demonstrate that these methods can have a substantial bias in the distribution they sample, even in the limit of vanishing step sizes and at full batch size.
( 3
min )
arXiv:2403.08700v1 Announce Type: cross
Abstract: Obstetric ultrasound image quality is crucial for accurate diagnosis and monitoring of fetal health. However, producing high-quality standard planes is difficult, influenced by the sonographer's expertise and factors like the maternal BMI or the fetus dynamics. In this work, we propose using diffusion-based counterfactual explainable AI to generate realistic high-quality standard planes from low-quality non-standard ones. Through quantitative and qualitative evaluation, we demonstrate the effectiveness of our method in producing plausible counterfactuals of increased quality. This shows future promise both for enhancing training of clinicians by providing visual feedback, as well as for improving image quality and, consequently, downstream diagnosis and monitoring.
( 2
min )
arXiv:2403.08609v1 Announce Type: cross
Abstract: Achieving robust uncertainty quantification for deep neural networks represents an important requirement in many real-world applications of deep learning such as medical imaging where it is necessary to assess the reliability of a neural network's prediction. Bayesian neural networks are a promising approach for modeling uncertainties in deep neural networks. Unfortunately, generating samples from the posterior distribution of neural networks is a major challenge. One significant advance in that direction would be the incorporation of adaptive step sizes, similar to modern neural network optimizers, into Monte Carlo Markov chain sampling algorithms without significantly increasing computational demand. Over the past years, several papers have introduced sampling algorithms with claims that they achieve this property. However, do they indeed converge to the correct distribution? In this paper, we demonstrate that these methods can have a substantial bias in the distribution they sample, even in the limit of vanishing step sizes and at full batch size.
( 3
min )
arXiv:2403.08652v1 Announce Type: cross
Abstract: Deep Neural Networks (DNNs) do not inherently compute or exhibit empirically-justified task confidence. In mission critical applications, it is important to both understand associated DNN reasoning and its supporting evidence. In this paper, we propose a novel Bayesian approach to extract explanations, justifications, and uncertainty estimates from DNNs. Our approach is efficient both in terms of memory and computation, and can be applied to any black box DNN without any retraining, including applications to anomaly detection and out-of-distribution detection tasks. We validate our approach on the CIFAR-10 dataset, and show that it can significantly improve the interpretability and reliability of DNNs.
( 2
min )
This is a guest post co-written with Scott Gutterman from the PGA TOUR. Generative artificial intelligence (generative AI) has enabled new possibilities for building intelligent systems. Recent improvements in Generative AI based large language models (LLMs) have enabled their use in a variety of applications surrounding information retrieval. Given the data sources, LLMs provided tools […]
( 11
min )
In the world of software development, code review and approval are important processes for ensuring the quality, security, and functionality of the software being developed. However, managers tasked with overseeing these critical processes often face numerous challenges, such as the following: Lack of technical expertise – Managers may not have an in-depth technical understanding of […]
( 9
min )
Generative AI applications driven by foundational models (FMs) are enabling organizations with significant business value in customer experience, productivity, process optimization, and innovations. However, adoption of these FMs involves addressing some key challenges, including quality output, data privacy, security, integration with organization data, cost, and skills to deliver. In this post, we explore different approaches […]
( 19
min )
Today, we’re excited to announce that the Gemma model is now available for customers using Amazon SageMaker JumpStart. Gemma is a family of language models based on Google’s Gemini models, trained on up to 6 trillion tokens of text. The Gemma family consists of two sizes: a 7 billion parameter model and a 2 billion parameter model. Now, […]
( 20
min )
The stars align this GFN Thursday as more top titles from Ubisoft and Square Enix join the cloud. Star Wars Outlaws will be coming to the GeForce NOW library at launch later this year, while STAR OCEAN THE SECOND STORY R and PARANORMASIGHT: The Seven Mysteries of Honjo are part of eight new titles joining
Read Article
( 6
min )
NVIDIA’s founder and CEO will discuss the future of AI at one of the world’s premier technology conferences.
( 5
min )
arXiv:2403.07092v1 Announce Type: cross
Abstract: Accurate detection and segmentation of diffuse large B-cell lymphoma (DLBCL) from PET images has important implications for estimation of total metabolic tumor volume, radiomics analysis, surgical intervention and radiotherapy. Manual segmentation of tumors in whole-body PET images is time-consuming, labor-intensive and operator-dependent. In this work, we develop and validate a fast and efficient three-step cascaded deep learning model for automated detection and segmentation of DLBCL tumors from PET images. As compared to a single end-to-end network for segmentation of tumors in whole-body PET images, our three-step model is more effective (improves 3D Dice score from 58.9% to 78.1%) since each of its specialized modules, namely the slice classifier, the tumor detector and the tumor segmentor, can be trained independently to a high degree of skill to carry out a specific task, rather than a single network with suboptimal performance on overall segmentation.
( 3
min )
arXiv:2403.07194v1 Announce Type: cross
Abstract: The aim of this study was to predict university students' learning performance using different sources of data from an Intelligent Tutoring System. We collected and preprocessed data from 40 students from different multimodal sources: learning strategies from system logs, emotions from face recording videos, interaction zones from eye tracking, and test performance from final knowledge evaluation. Our objective was to test whether the prediction could be improved by using attribute selection and classification ensembles. We carried out three experiments by applying six classification algorithms to numerical and discretized preprocessed multimodal data. The results show that the best predictions were produced using ensembles and selecting the best attributes approach with numerical data.
( 2
min )
arXiv:2403.07605v1 Announce Type: cross
Abstract: In text-to-image generation, using negative prompts, which describe undesirable image characteristics, can significantly boost image quality. However, producing good negative prompts is manual and tedious. To address this, we propose NegOpt, a novel method for optimizing negative prompt generation toward enhanced image generation, using supervised fine-tuning and reinforcement learning. Our combined approach results in a substantial increase of 25% in Inception Score compared to other approaches and surpasses ground-truth negative prompts from the test set. Furthermore, with NegOpt we can preferentially optimize the metrics most important to us. Finally, we construct Negative Prompts DB, a dataset of negative prompts.
( 2
min )
arXiv:2403.07137v1 Announce Type: cross
Abstract: Assessing the biotype of cattle through human visual inspection is a very common and important practice in precision cattle breeding. This paper presents the results of a correlation analysis between scores produced by humans for Nelore cattle and a variety of measurements that can be derived from images or other instruments. It also presents a study using the k-means algorithm to generate new ways of clustering a batch of cattle using the measurements that most correlate with the animal's body weight and visual scores.
( 2
min )
arXiv:2403.07460v1 Announce Type: new
Abstract: Time-to-event analysis is a branch of statistics that has increased in popularity during the last decades due to its many application fields, such as predictive maintenance, customer churn prediction and population lifetime estimation. In this paper, we review and compare the performance of several prediction models for time-to-event analysis. These consist of semi-parametric and parametric statistical models, in addition to machine learning approaches. Our study is carried out on three datasets and evaluated in two different scores (the integrated Brier score and concordance index). Moreover, we show how ensemble methods, which surprisingly have not yet been much studied in time-to-event analysis, can improve the prediction accuracy and enhance the robustness of the prediction performance. We conclude the analysis with a simulation experiment in which we evaluate the factors influencing the performance ranking of the methods using both scores.
( 2
min )
arXiv:2403.07191v1 Announce Type: new
Abstract: Recent advances in reinforcement learning (RL) algorithms aim to enhance the performance of language models at scale. Yet, there is a noticeable absence of a cost-effective and standardized testbed tailored to evaluating and comparing these algorithms. To bridge this gap, we present a generalized version of the 24-Puzzle: the $(N,K)$-Puzzle, which challenges language models to reach a target value $K$ with $N$ integers. We evaluate the effectiveness of established RL algorithms such as Proximal Policy Optimization (PPO), alongside novel approaches like Identity Policy Optimization (IPO) and Direct Policy Optimization (DPO).
( 2
min )
Online gaming and social communities offer voice and text chat functionality for their users to communicate. Although voice and text chat often support friendly banter, it can also lead to problems such as hate speech, cyberbullying, harassment, and scams. Today, many companies rely solely on human moderators to review toxic content. However, verifying violations in […]
( 9
min )
Advancements in artificial intelligence (AI) and machine learning (ML) are revolutionizing the financial industry for use cases such as fraud detection, credit worthiness assessment, and trading strategy optimization. To develop models for such use cases, data scientists need access to various datasets like credit decision engines, customer transactions, risk appetite, and stress testing. Managing appropriate […]
( 6
min )
If AI is having its iPhone moment, then chatbots are one of its first popular apps.
( 8
min )
AI-driven change is in the air, as are concerns about the technology’s environmental impact. In this episode of NVIDIA’s AI Podcast, Daniel Castro, vice president of the Information Technology and Innovation Foundation and director of its Center for Data Innovation, speaks with host Noah Kravitz about the motivation behind his AI energy use report, which
Read Article
( 5
min )
Entrepreneurs are always looking for new and creative ways to keep ahead of the competition in the advancement of AI technology. One such…
( 9
min )
We have partnered with international news organizations Le Monde and Prisa Media to bring French and Spanish news content to ChatGPT.
( 2
min )
arXiv:2403.06279v1 Announce Type: cross
Abstract: This paper aims to develop and provide a rigorous treatment to the problem of entropy regularized fine-tuning in the context of continuous-time diffusion models, which was recently proposed by Uehara et al. ( arXiv:2402.15194, 2024). We also show how the analysis can be extended to fine-tuning involving a general $f$-divergence regularizer.
( 2
min )
arXiv:2309.05961v4 Announce Type: replace-cross
Abstract: Community Question Answering (CQA) platforms steadily gain popularity as they provide users with fast responses to their queries. The swiftness of these responses is contingent on a mixture of query-specific and user-related elements. This paper scrutinizes these contributing factors within the context of six highly popular CQA platforms, identified through their standout answering speed. Our investigation reveals a correlation between the time taken to yield the first response to a question and several variables: the metadata, the formulation of the questions, and the level of interaction among users. Additionally, by employing conventional machine learning models to analyze these metadata and patterns of user interaction, we endeavor to predict which queries will receive their initial responses promptly.
( 2
min )
arXiv:2305.09744v2 Announce Type: replace-cross
Abstract: The physics potential of massive liquid argon TPCs in the low-energy regime is still to be fully reaped because few-hits events encode information that can hardly be exploited by conventional classification algorithms. Machine learning (ML) techniques give their best in these types of classification problems. In this paper, we evaluate their performance against conventional (deterministic) algorithms. We demonstrate that both Convolutional Neural Networks (CNN) and Transformer-Encoder methods outperform deterministic algorithms in one of the most challenging classification problems of low-energy physics (single- versus double-beta events). We discuss the advantages and pitfalls of Transformer-Encoder methods versus CNN and employ these methods to optimize the detector parameters, with an emphasis on the DUNE Phase II detectors ("Module of Opportunity").
( 2
min )
arXiv:2310.17159v2 Announce Type: replace
Abstract: We present a new loss function that addresses the out-of-distribution (OOD) calibration problem. While many objective functions have been proposed to effectively calibrate models in-distribution, our findings show that they do not always fare well OOD. Based on the Principle of Maximum Entropy, we incorporate helpful statistical constraints observed during training, delivering better model calibration without sacrificing accuracy. We provide theoretical analysis and show empirically that our method works well in practice, achieving state-of-the-art calibration on both synthetic and real-world benchmarks.
( 2
min )
arXiv:2403.06557v1 Announce Type: cross
Abstract: We present a data-driven control architecture for modifying the kinematics of robots and artificial avatars to encode specific information such as the presence or not of an emotion in the movements of an avatar or robot driven by a human operator. We validate our approach on an experimental dataset obtained during the reach-to-grasp phase of a pick-and-place task.
( 2
min )
arXiv:2403.06545v1 Announce Type: cross
Abstract: The creation of in-silico datasets can expand the utility of existing annotations to new domains with different staining patterns in computational pathology. As such, it has the potential to significantly lower the cost associated with building large and pixel precise datasets needed to train supervised deep learning models. We propose a novel approach for the generation of in-silico immunohistochemistry (IHC) images by disentangling morphology specific IHC stains into separate image channels in immunofluorescence (IF) images. The proposed approach qualitatively and quantitatively outperforms baseline methods as proven by training nucleus segmentation models on the created in-silico datasets.
( 2
min )
arXiv:2403.06524v1 Announce Type: new
Abstract: We develop a deep reinforcement learning framework for tactical decision making in an autonomous truck, specifically for Adaptive Cruise Control (ACC) and lane change maneuvers in a highway scenario. Our results demonstrate that it is beneficial to separate high-level decision-making processes and low-level control actions between the reinforcement learning agent and the low-level controllers based on physical models. In the following, we study optimizing the performance with a realistic and multi-objective reward function based on Total Cost of Operation (TCOP) of the truck using different approaches; by adding weights to reward components, by normalizing the reward components and by using curriculum learning techniques.
( 2
min )
arXiv:2403.06031v1 Announce Type: new
Abstract: Machine learning requires defining one's target variable for predictions or decisions, a process that can have profound implications on fairness: biases are often encoded in target variable definition itself, before any data collection or training. We present an interactive simulator, FairTargetSim (FTS), that illustrates how target variable definition impacts fairness. FTS is a valuable tool for algorithm developers, researchers, and non-technical stakeholders. FTS uses a case study of algorithmic hiring, using real-world data and user-defined target variables. FTS is open-source and available at: http://tinyurl.com/ftsinterface. The video accompanying this paper is here: http://tinyurl.com/ijcaifts.
( 2
min )
arXiv:2403.05610v1 Announce Type: new
Abstract: Understanding the convergence process of neural networks is one of the most complex and crucial issues in the field of machine learning. Despite the close association of notable successes in this domain with the convergence of artificial neural networks, this concept remains predominantly theoretical. In reality, due to the non-convex nature of the optimization problems that artificial neural networks tackle, very few trained networks actually achieve convergence. To expand recent research efforts on artificial-neural-network convergence, this paper will discuss a different approach based on observations of cohesive-convergence groups emerging during the optimization process of an artificial neural network.
( 2
min )
Traditionally reliant on human expertise and manual research, the legal profession is at the cusp of a technological shift as Artificial…
( 11
min )
By Frederic Friedel
( 11
min )
Why Bias in LLMs is Unavoidable
Continue reading on Becoming Human: Artificial Intelligence Magazine »
( 4
min )
It is the International Brain Awareness Week for 2024, with events across institutes, globally, from March 11 – 17. This week is a good time to explore the pedestal of the brain, against the astounding rise of machines. AI embodiment was recently featured in Scientific American, AI Chatbot Brains Are Going Inside Robot Bodies. What… Read More »Embodied AI: Would LLMs and robots surpass the human brain?
The post Embodied AI: Would LLMs and robots surpass the human brain? appeared first on Data Science Central.
( 22
min )
For students, researchers and educators eager to delve into AI, GTC — NVIDIA’s conference on AI and accelerated computing — is in a class of its own. Taking place from March 18-21 at the San Jose Convention Center, GTC features over 900 talks presented by world-renowned experts in fields such as generative AI, high performance
Read Article
( 5
min )
This post discusses how Nitro Enclaves can help protect LLM model deployments, specifically those that use personally identifiable information (PII) or protected health information (PHI). This post is for educational purposes only and should not be used in production environments without additional controls.
( 12
min )
arXiv:2309.15278v2 Announce Type: replace-cross
Abstract: Robots need to have a memory of previously observed, but currently occluded objects to work reliably in realistic environments. We investigate the problem of encoding object-oriented memory into a multi-object manipulation reasoning and planning framework. We propose DOOM and LOOM, which leverage transformer relational dynamics to encode the history of trajectories given partial-view point clouds and an object discovery and tracking engine. Our approaches can perform multiple challenging tasks including reasoning with occluded objects, novel objects appearance, and object reappearance. Throughout our extensive simulation and real-world experiments, we find that our approaches perform well in terms of different numbers of objects and different numbers of distractor actions. Furthermore, we show our approaches outperform an implicit memory baseline.
( 2
min )
arXiv:2210.03859v3 Announce Type: replace-cross
Abstract: This paper proposes an improved linear discriminant analysis called spectrally-corrected and regularized LDA (SRLDA). This method integrates the design ideas of the sample spectrally-corrected covariance matrix and the regularized discriminant analysis. With the support of a large-dimensional random matrix analysis framework, it is proved that SRLDA has a linear classification global optimal solution under the spiked model assumption. According to simulation data analysis, the SRLDA classifier performs better than RLDA and ILDA and is closer to the theoretical classifier. Experiments on different data sets show that the SRLDA algorithm performs better in classification and dimensionality reduction than currently used tools.
( 2
min )
arXiv:2109.03396v2 Announce Type: replace
Abstract: In this paper, we propose Posterior Sampling Reinforcement Learning for Zero-sum Stochastic Games (PSRL-ZSG), the first online learning algorithm that achieves Bayesian regret bound of $O(HS\sqrt{AT})$ in the infinite-horizon zero-sum stochastic games with average-reward criterion. Here $H$ is an upper bound on the span of the bias function, $S$ is the number of states, $A$ is the number of joint actions and $T$ is the horizon. We consider the online setting where the opponent can not be controlled and can take any arbitrary time-adaptive history-dependent strategy. Our regret bound improves on the best existing regret bound of $O(\sqrt[3]{DS^2AT^2})$ by Wei et al. (2017) under the same assumption and matches the theoretical lower bound in $T$.
( 2
min )
arXiv:2403.04798v1 Announce Type: cross
Abstract: This paper presents our system development for SemEval-2024 Task 3: "The Competition of Multimodal Emotion Cause Analysis in Conversations". Effectively capturing emotions in human conversations requires integrating multiple modalities such as text, audio, and video. However, the complexities of these diverse modalities pose challenges for developing an efficient multimodal emotion cause analysis (ECA) system. Our proposed approach addresses these challenges by a two-step framework. We adopt two different approaches in our implementation. In Approach 1, we employ instruction-tuning with two separate Llama 2 models for emotion and cause prediction. In Approach 2, we use GPT-4V for conversation-level video description and employ in-context learning with annotated conversation using GPT 3.5. Our system wins rank 4, and system ablation experiments demonstrate that our proposed solutions achieve significant performance gains. All the experimental codes are available on Github.
( 2
min )
arXiv:2403.05395v1 Announce Type: new
Abstract: Advanced machine learning methods, and more prominently neural networks, have become standard to solve inverse problems over the last years. However, the theoretical recovery guarantees of such methods are still scarce and difficult to achieve. Only recently did unsupervised methods such as Deep Image Prior (DIP) get equipped with convergence and recovery guarantees for generic loss functions when trained through gradient flow with an appropriate initialization. In this paper, we extend these results by proving that these guarantees hold true when using gradient descent with an appropriately chosen step-size/learning rate. We also show that the discretization only affects the overparametrization bound for a two-layer DIP network by a constant and thus that the different guarantees found for the gradient flow will hold for gradient descent.
( 2
min )
arXiv:2403.05033v1 Announce Type: new
Abstract: This paper presents our experiments to quantify the manifolds learned by ML models (in our experiment, we use a GAN model) as they train. We compare the manifolds learned at each epoch to the real manifolds representing the real data. To quantify a manifold, we study the intrinsic dimensions and topological features of the manifold learned by the ML model, how these metrics change as we continue to train the model, and whether these metrics convergence over the course of training to the metrics of the real data manifold.
( 2
min )
arXiv:2311.02490v2 Announce Type: replace-cross
Abstract: This paper studies the commonly utilized windowed Anderson acceleration (AA) algorithm for fixed-point methods, $x^{(k+1)}=q(x^{(k)})$. It provides the first proof that when the operator $q$ is linear and symmetric the windowed AA, which uses a sliding window of prior iterates, improves the root-linear convergence factor over the fixed-point iterations. When $q$ is nonlinear, yet has a symmetric Jacobian at a fixed point, a slightly modified AA algorithm is proved to have an analogous root-linear convergence factor improvement over fixed-point iterations. Simulations verify our observations. Furthermore, experiments with different data models demonstrate AA is significantly superior to the standard fixed-point methods for Tyler's M-estimation.
( 2
min )
arXiv:2210.03859v3 Announce Type: replace
Abstract: This paper proposes an improved linear discriminant analysis called spectrally-corrected and regularized LDA (SRLDA). This method integrates the design ideas of the sample spectrally-corrected covariance matrix and the regularized discriminant analysis. With the support of a large-dimensional random matrix analysis framework, it is proved that SRLDA has a linear classification global optimal solution under the spiked model assumption. According to simulation data analysis, the SRLDA classifier performs better than RLDA and ILDA and is closer to the theoretical classifier. Experiments on different data sets show that the SRLDA algorithm performs better in classification and dimensionality reduction than currently used tools.
( 2
min )
The ecosystem around NVIDIA’s technologies has always been verdant — but this is absurd. After a stunning premiere at the World Economic Forum in Davos, immersive artworks based on Refik Anadol Studio’s Large Nature Model will come to the U.S. for the first time at NVIDIA GTC. Offering a deep dive into the synergy between
Read Article
( 6
min )
You might say that James Alberque has a bird’s-eye view of the road congestion and challenges that come with a booming U.S. city. Alberque analyzes traffic data for Raleigh, North Carolina, which has seen its population more than double in the past three decades. The city has been working with NVIDIA and its partners to
Read Article
( 6
min )
VistaPrint, a Cimpress business, is the design and marketing partner to millions of small businesses around the world. For more than two decades, VistaPrint has empowered small businesses to quickly and effectively create the marketing products – from promotional materials and signage to print advertising and more – to get the job done, regardless of […]
( 7
min )
AI Weirdness: the strange side of machine learning
( 2
min )
arXiv:2403.04190v1 Announce Type: new
Abstract: The recent surge in research focused on generating synthetic data from large language models (LLMs), especially for scenarios with limited data availability, marks a notable shift in Generative Artificial Intelligence (AI). Their ability to perform comparably to real-world data positions this approach as a compelling solution to low-resource challenges. This paper delves into advanced technologies that leverage these gigantic LLMs for the generation of task-specific training data. We outline methodologies, evaluation techniques, and practical applications, discuss the current limitations, and suggest potential pathways for future research.
( 2
min )
arXiv:2401.16235v2 Announce Type: replace
Abstract: In soccer, contextual player performance metrics are invaluable to coaches. For example, the ability to perform under pressure during matches distinguishes the elite from the average. Appropriate pressure metric enables teams to assess players' performance accurately under pressure and design targeted training scenarios to address their weaknesses. The primary objective of this paper is to leverage both tracking and event data and game footage to capture the pressure experienced by the possession team in a soccer game scene. We propose a player pressure map to represent a given game scene, which lowers the dimension of raw data and still contains rich contextual information. Not only does it serve as an effective tool for visualizing and evaluating the pressure on the team and each individual, but it can also be utilized as a backbone for accessing players' performance. Overall, our model provides coaches and analysts with a deeper understanding of players' performance under pressure so that they make data-oriented tactical decisions.
( 2
min )
arXiv:2403.04189v1 Announce Type: cross
Abstract: Modern machine learning (ML) applications are becoming increasingly complex and monolithic (single chip) accelerator architectures cannot keep up with their energy efficiency and throughput demands. Even though modern digital electronic accelerators are gradually adopting 2.5D architectures with multiple smaller chiplets to improve scalability, they face fundamental limitations due to a reliance on slow metallic interconnects. This paper outlines how optical communication and computation can be leveraged in 2.5D platforms to realize energy-efficient and high throughput 2.5D ML accelerator architectures.
( 2
min )
arXiv:2403.04109v1 Announce Type: cross
Abstract: Adaptive training programs are crucial for recovery post stroke. However, developing programs that automatically adapt depends on quantifying how difficult a task is for a specific individual at a particular stage of their recovery. In this work, we propose a method that automatically generates regions of different task difficulty levels based on an individual's performance. We show that this technique explains the variance in user performance for a reaching task better than previous approaches to estimating task difficulty.
( 2
min )
arXiv:2403.04764v1 Announce Type: new
Abstract: This paper presents a new approach for batch Bayesian Optimization (BO), where the sampling takes place by minimizing a Thompson Sampling approximation of a regret to uncertainty ratio. Our objective is able to coordinate the actions chosen in each batch in a way that minimizes redundancy between points whilst focusing on points with high predictive means or high uncertainty. We provide high-probability theoretical guarantees on the regret of our algorithm. Finally, numerically, we demonstrate that our method attains state-of-the-art performance on a range of nonconvex test functions, where it outperforms several competitive benchmark batch BO algorithms by an order of magnitude on average.
( 2
min )
arXiv:2403.04580v1 Announce Type: new
Abstract: Mechanistic understanding of organic reactions can facilitate reaction development, impurity prediction, and in principle, reaction discovery. While several machine learning models have sought to address the task of predicting reaction products, their extension to predicting reaction mechanisms has been impeded by the lack of a corresponding mechanistic dataset. In this study, we construct such a dataset by imputing intermediates between experimentally reported reactants and products using expert reaction templates and train several machine learning models on the resulting dataset of 5,184,184 elementary steps. We explore the performance and capabilities of these models, focusing on their ability to predict reaction pathways and recapitulate the roles of catalysts and reagents. Additionally, we demonstrate the potential of mechanistic models in predicting impurities, often overlooked by conventional models. We conclude by evaluating the generalizability of mechanistic models to new reaction types, revealing challenges related to dataset diversity, consecutive predictions, and violations of atom conservation.
( 2
min )
arXiv:2403.04764v1 Announce Type: cross
Abstract: This paper presents a new approach for batch Bayesian Optimization (BO), where the sampling takes place by minimizing a Thompson Sampling approximation of a regret to uncertainty ratio. Our objective is able to coordinate the actions chosen in each batch in a way that minimizes redundancy between points whilst focusing on points with high predictive means or high uncertainty. We provide high-probability theoretical guarantees on the regret of our algorithm. Finally, numerically, we demonstrate that our method attains state-of-the-art performance on a range of nonconvex test functions, where it outperforms several competitive benchmark batch BO algorithms by an order of magnitude on average.
( 2
min )
In the intricate landscape of Conversational AI, Retrieval-Augmented Generation (RAG) emerges as a technical marvel, seamlessly merging the strengths of generative and retrieval models. At its core, RAG tackles the challenge of precision in responses by introducing a dynamic knowledge retrieval component. In this technical exploration, we delve into the underpinnings of RAG. Imagine a… Read More »Decoding RAG: Exploring its significance in the realm of generative AI
The post Decoding RAG: Exploring its significance in the realm of generative AI appeared first on Data Science Central.
( 23
min )
New board members named and enhancements to the governance structure introduced
( 3
min )
Dr. Sue Desmond-Hellmann, Nicole Seligman, Fidji Simo join; Sam Altman rejoins board
( 2
min )
By enabling models to see the world more like humans do, the work could help improve driver safety and shed light on human behavior.
( 7
min )
arXiv:2403.03274v1 Announce Type: cross
Abstract: Digital health technologies (DHT), such as wearable devices, provide personalized, continuous, and real-time monitoring of patient. These technologies are contributing to the development of novel therapies and personalized medicine. Gaining insight from these technologies requires appropriate modeling techniques to capture clinically-relevant changes in disease state. The data generated from these devices is characterized by being stochastic in nature, may have missing elements, and exhibits considerable inter-individual variability - thereby making it difficult to analyze using traditional longitudinal modeling techniques. We present a novel pharmacology-informed neural stochastic differential equation (SDE) model capable of addressing these challenges. Using synthetic data, we demonstrate that our approach is effective in identifying treatment effects and learning causal relationships from stochastic data, thereby enabling counterfactual simulation.
( 2
min )
arXiv:2403.03664v1 Announce Type: cross
Abstract: Ambient air pollution is a pervasive issue with wide-ranging effects on human health, ecosystem vitality, and economic structures. Utilizing data on ambient air pollution concentrations, researchers can perform comprehensive analyses to uncover the multifaceted impacts of air pollution across society. To this end, we introduce Environmental Insights, an open-source Python package designed to democratize access to air pollution concentration data. This tool enables users to easily retrieve historical air pollution data and employ a Machine Learning model for forecasting potential future conditions. Moreover, Environmental Insights includes a suite of tools aimed at facilitating the dissemination of analytical findings and enhancing user engagement through dynamic visualizations. This comprehensive approach ensures that the package caters to the diverse needs of individuals looking to explore and understand air pollution trends and their implications.
( 2
min )
arXiv:2403.03581v1 Announce Type: cross
Abstract: Purpose: Our study explored the use of artificial intelligence (AI) to diagnose autism spectrum disorder (ASD). It focused on machine learning (ML) and deep learning (DL) to detect ASD from text inputs on social media, addressing challenges in traditional ASD diagnosis.
Methods: We used natural language processing (NLP), ML, and DL models (including decision trees, XGB, KNN, RNN, LSTM, Bi-LSTM, BERT, and BERTweet) to analyze 404,627 tweets, classifying them based on ASD or non-ASD authors. A subset of 90,000 tweets was used for model training and testing.
Results: Our AI models showed high accuracy, with an 88% success rate in identifying texts from individuals with ASD.
Conclusion: The study demonstrates AI's potential in improving ASD diagnosis, especially in children, highlighting the importance of early detection.
( 2
min )
arXiv:2403.03229v1 Announce Type: cross
Abstract: The right ventricular (RV) function deterioration strongly predicts clinical outcomes in numerous circumstances. To boost the clinical deployment of ensemble regression methods that quantify RV volumes using tabular data from the widely available two-dimensional echocardiography (2DE), we propose to complement the volume predictions with uncertainty scores. To this end, we employ an instance-based method which uses the learned tree structure to identify the nearest training samples to a target instance and then uses a number of distribution types to more flexibly model the output. The probabilistic and point-prediction performances of the proposed framework are evaluated on a relatively small-scale dataset, comprising 100 end-diastolic and end-systolic RV volumes. The reference values for point performance were obtained from MRI. The results demonstrate that our flexible approach yields improved probabilistic and point performances over other state-of-the-art methods. The appropriateness of the proposed framework is showcased by providing exemplar cases. The estimated uncertainty embodies both aleatoric and epistemic types. This work aligns with trustworthy artificial intelligence since it can be used to enhance the decision-making process and reduce risks. The feature importance scores of our framework can be exploited to reduce the number of required 2DE views which could enhance the proposed pipeline's clinical application.
( 3
min )
arXiv:2403.03728v1 Announce Type: new
Abstract: This study addresses the integration of diversity-based and uncertainty-based sampling strategies in active learning, particularly within the context of self-supervised pre-trained models. We introduce a straightforward heuristic called TCM that mitigates the cold start problem while maintaining strong performance across various data levels. By initially applying TypiClust for diversity sampling and subsequently transitioning to uncertainty sampling with Margin, our approach effectively combines the strengths of both strategies. Our experiments demonstrate that TCM consistently outperforms existing methods across various datasets in both low and high data regimes.
( 2
min )
arXiv:2403.03292v1 Announce Type: new
Abstract: State-of-the-art decentralized learning algorithms typically require the data distribution to be Independent and Identically Distributed (IID). However, in practical scenarios, the data distribution across the agents can have significant heterogeneity. In this work, we propose averaging rate scheduling as a simple yet effective way to reduce the impact of heterogeneity in decentralized learning. Our experiments illustrate the superiority of the proposed method (~3% improvement in test accuracy) compared to the conventional approach of employing a constant averaging rate.
( 2
min )
arXiv:2306.00266v2 Announce Type: replace-cross
Abstract: We propose an efficient algorithm for matching two correlated Erd\H{o}s--R\'enyi graphs with $n$ vertices whose edges are correlated through a latent vertex correspondence. When the edge density $q= n^{- \alpha+o(1)}$ for a constant $\alpha \in [0,1)$, we show that our algorithm has polynomial running time and succeeds to recover the latent matching as long as the edge correlation is non-vanishing. This is closely related to our previous work on a polynomial-time algorithm that matches two Gaussian Wigner matrices with non-vanishing correlation, and provides the first polynomial-time random graph matching algorithm (regardless of the regime of $q$) when the edge correlation is below the square root of the Otter's constant (which is $\approx 0.338$).
( 2
min )
2024 will be the year generative AI gets personal, the CEOs of NVIDIA and HP said today in a fireside chat, unveiling new laptops that can build, test and run large language models. “This is a renaissance of the personal computer,” said NVIDIA founder and CEO Jensen Huang at HP Amplify, a gathering in Las
Read Article
( 6
min )
NVIDIA is offering a new professional certification in generative AI to enable developers to establish technical credibility in this important domain. Generative AI is revolutionizing industries worldwide, yet there’s a critical skills gap and need to uplevel employees to more fully harness the technology. Available for the first time from NVIDIA, this new professional certification
Read Article
( 5
min )
Gamers can now seize the day with Day Passes, available to purchase for 24-hour continuous access to powerful cloud gaming with all the benefits of a GeForce NOW Ultimate or Priority membership — no commitment required. Publisher Cygames brings its next triple-A title to the cloud. Granblue Fantasy: Relink leads eight new games joining the
Read Article
( 6
min )
Many customers, including those in creative advertising, media and entertainment, ecommerce, and fashion, often need to change the background in a large number of images. Typically, this involves manually editing each image with photo software. This can take a lot of effort, especially for large batches of images. However, Amazon Bedrock and AWS Step Functions […]
( 8
min )
Structural Understanding Capabilities is a new benchmark for evaluating and improving LLM comprehension of structured table data. This advance can help LLMs process and analyze data more effectively, broadening their applicability in real-world tasks.
The post Improving LLM understanding of structured data and exploring advanced prompting methods appeared first on Microsoft Research.
( 10
min )
arXiv:2311.18377v2 Announce Type: replace-cross
Abstract: Machine learning is becoming a preferred method for the virtual screening of organic materials due to its cost-effectiveness over traditional computationally demanding techniques. However, the scarcity of labeled data for organic materials poses a significant challenge for training advanced machine learning models. This study showcases the potential of utilizing databases of drug-like small molecules and chemical reactions to pretrain the BERT model, enhancing its performance in the virtual screening of organic materials. By fine-tuning the BERT models with data from five virtual screening tasks, the version pretrained with the USPTO-SMILES dataset achieved R2 scores exceeding 0.94 for three tasks and over 0.81 for two others. This performance surpasses that of models pretrained on the small molecule or organic materials databases and outperforms three traditional machine learning models trained directly on virtual screening data. The success of the USPTO-SMILES pretrained BERT model can be attributed to the diverse array of organic building blocks in the USPTO database, offering a broader exploration of the chemical space. The study further suggests that accessing a reaction database with a wider range of reactions than the USPTO could further enhance model performance. Overall, this research validates the feasibility of applying transfer learning across different chemical domains for the efficient virtual screening of organic materials.
( 3
min )
arXiv:2311.03131v3 Announce Type: replace-cross
Abstract: In this study, we address the challenge of analyzing electrophysiological measurements in neuronal networks. Our computational model, based on the Reservoir Computing Network (RCN) architecture, deciphers spatio-temporal data obtained from electrophysiological measurements of neuronal cultures. By reconstructing the network structure on a macroscopic scale, we reveal the connectivity between neuronal units. Notably, our model outperforms common methods like Cross-Correlation and Transfer-Entropy in predicting the network's connectivity map. Furthermore, we experimentally validate its ability to forecast network responses to specific inputs, including localized optogenetic stimuli.
( 2
min )
arXiv:2403.02873v1 Announce Type: new
Abstract: This short note describes a simple technique for analyzing probabilistic algorithms that rely on a light-tailed (but not necessarily bounded) source of randomization. We show that the analysis of such an algorithm can be reduced, in a black-box manner and with only a small loss in logarithmic factors, to an analysis of a simpler variant of the same algorithm that uses bounded random variables and often easier to analyze. This approach simultaneously applies to any light-tailed randomization, including exponential, sub-Gaussian, and more general fast-decaying distributions, without needing to appeal to specialized concentration inequalities. Analyses of a generalized Azuma inequality and stochastic optimization with general light-tailed noise are provided to illustrate the technique.
( 2
min )
arXiv:2403.02786v1 Announce Type: new
Abstract: Addressing the challenge of limited labeled data in clinical settings, particularly in the prediction of fatty liver disease, this study explores the potential of graph representation learning within a semi-supervised learning framework. Leveraging graph neural networks (GNNs), our approach constructs a subject similarity graph to identify risk patterns from health checkup data. The effectiveness of various GNN approaches in this context is demonstrated, even with minimal labeled samples. Central to our methodology is the inclusion of human-centric explanations through explainable GNNs, providing personalized feature importance scores for enhanced interpretability and clinical relevance, thereby underscoring the potential of our approach in advancing healthcare practices with a keen focus on graph representation learning and human-centric explanation.
( 2
min )
arXiv:2001.05371v4 Announce Type: replace
Abstract: Deep neural networks have shown excellent performances in many real-world applications. Unfortunately, they may show "Clever Hans"-like behavior -- making use of confounding factors within datasets -- to achieve high performance. In this work, we introduce the novel learning setting of "explanatory interactive learning" (XIL) and illustrate its benefits on a plant phenotyping research task. XIL adds the scientist into the training loop such that she interactively revises the original model via providing feedback on its explanations. Our experimental results demonstrate that XIL can help avoiding Clever Hans moments in machine learning and encourages (or discourages, if appropriate) trust into the underlying model.
( 2
min )
arXiv:2403.02936v1 Announce Type: cross
Abstract: In this paper, we propose an architecture of a novel adaptive fault-tolerant approximate multiplier tailored for ASIC-based DNN accelerators.
( 2
min )
arXiv:2403.02930v1 Announce Type: cross
Abstract: We present a detailed replication study of the BASS framework, an abstractive summarization system based on the notion of Unified Semantic Graphs. Our investigation includes challenges in replicating key components and an ablation study to systematically isolate error sources rooted in replicating novel components. Our findings reveal discrepancies in performance compared to the original work. We highlight the significance of paying careful attention even to reasonably omitted details for replicating advanced frameworks like BASS, and emphasize key practices for writing replicable papers.
( 2
min )
arXiv:2403.02946v1 Announce Type: cross
Abstract: Systolic array has emerged as a prominent architecture for Deep Neural Network (DNN) hardware accelerators, providing high-throughput and low-latency performance essential for deploying DNNs across diverse applications. However, when used in safety-critical applications, reliability assessment is mandatory to guarantee the correct behavior of DNN accelerators. While fault injection stands out as a well-established practical and robust method for reliability assessment, it is still a very time-consuming process. This paper addresses the time efficiency issue by introducing a novel hierarchical software-based hardware-aware fault injection strategy tailored for systolic array-based DNN accelerators.
( 2
min )
arXiv:2403.02906v1 Announce Type: cross
Abstract: Technology is increasingly used in Nature Reserves and National Parks around the world to support conservation efforts. Endangered species, such as the Eurasian Lynx (Lynx lynx), are monitored by a network of automatic photo traps. Yet, this method produces vast amounts of data, which needs to be prepared, analyzed and interpreted. Therefore, researchers working in this area increasingly need support to process this incoming information. One opportunity is to seek support from volunteer Citizen Scientists who can help label the data, however, it is challenging to retain their interest. Another way is to automate the process with image recognition using convolutional neural networks. During the panel, we will discuss considerations related to nature research and conservation as well as opportunities for the use of Citizen Science and Machine Learning to expedite the process of data preparation, labelling and analysis.
( 2
min )
arXiv:2403.02432v1 Announce Type: cross
Abstract: We study a new technique for understanding convergence of learning agents under small modifications of data. We show that such convergence can be understood via an analogue of Fatou's lemma which yields gamma-convergence. We show it's relevance and applications in general machine learning tasks and domain adaptation transfer learning.
( 2
min )
arXiv:2403.03069v1 Announce Type: new
Abstract: We consider the task of estimating variational autoencoders (VAEs) when the training data is incomplete. We show that missing data increases the complexity of the model's posterior distribution over the latent variables compared to the fully-observed case. The increased complexity may adversely affect the fit of the model due to a mismatch between the variational and model posterior distributions. We introduce two strategies based on (i) finite variational-mixture and (ii) imputation-based variational-mixture distributions to address the increased posterior complexity. Through a comprehensive evaluation of the proposed approaches, we show that variational mixtures are effective at improving the accuracy of VAE estimation from incomplete data.
( 2
min )
arXiv:2403.02811v1 Announce Type: cross
Abstract: In this paper, we study how the Koopman operator framework can be combined with kernel methods to effectively control nonlinear dynamical systems. While kernel methods have typically large computational requirements, we show how random subspaces (Nystr\"om approximation) can be used to achieve huge computational savings while preserving accuracy. Our main technical contribution is deriving theoretical guarantees on the effect of the Nystr\"om approximation. More precisely, we study the linear quadratic regulator problem, showing that both the approximated Riccati operator and the regulator objective, for the associated solution of the optimal control problem, converge at the rate $m^{-1/2}$, where $m$ is the random subspace size. Theoretical findings are complemented by numerical experiments corroborating our results.
( 2
min )
arXiv:2001.05371v4 Announce Type: replace-cross
Abstract: Deep neural networks have shown excellent performances in many real-world applications. Unfortunately, they may show "Clever Hans"-like behavior -- making use of confounding factors within datasets -- to achieve high performance. In this work, we introduce the novel learning setting of "explanatory interactive learning" (XIL) and illustrate its benefits on a plant phenotyping research task. XIL adds the scientist into the training loop such that she interactively revises the original model via providing feedback on its explanations. Our experimental results demonstrate that XIL can help avoiding Clever Hans moments in machine learning and encourages (or discourages, if appropriate) trust into the underlying model.
( 2
min )
arXiv:2403.03069v1 Announce Type: cross
Abstract: We consider the task of estimating variational autoencoders (VAEs) when the training data is incomplete. We show that missing data increases the complexity of the model's posterior distribution over the latent variables compared to the fully-observed case. The increased complexity may adversely affect the fit of the model due to a mismatch between the variational and model posterior distributions. We introduce two strategies based on (i) finite variational-mixture and (ii) imputation-based variational-mixture distributions to address the increased posterior complexity. Through a comprehensive evaluation of the proposed approaches, we show that variational mixtures are effective at improving the accuracy of VAE estimation from incomplete data.
( 2
min )
arXiv:2403.02432v1 Announce Type: new
Abstract: We study a new technique for understanding convergence of learning agents under small modifications of data. We show that such convergence can be understood via an analogue of Fatou's lemma which yields gamma-convergence. We show it's relevance and applications in general machine learning tasks and domain adaptation transfer learning.
( 2
min )
Research advances are driving real-world impact faster than ever. Episode 2 of Microsoft Research Forum explores how AI is transforming health care and the natural sciences, the intersection of AI and society, and the evolution of foundational AI technologies.
The post Research Forum Episode 2: Transforming health care and the natural sciences, AI and society, and the evolution of foundational AI technologies appeared first on Microsoft Research.
( 11
min )
In this issue: Generative kaleidoscopic networks; Text diffusion with reinforced conditioning; PRISE – Learning temporal action abstractions as a sequence compression problem.
The post Research Focus: Week of March 4, 2024 appeared first on Microsoft Research.
( 9
min )
In this post, we demonstrate how to efficiently fine-tune a state-of-the-art protein language model (pLM) to predict protein subcellular localization using Amazon SageMaker. Proteins are the molecular machines of the body, responsible for everything from moving your muscles to responding to infections. Despite this variety, all proteins are made of repeating chains of molecules called […]
( 9
min )
As visual generative AI matures from research to the enterprise domain, businesses are seeking responsible ways to integrate the technology into their products. Bria, a startup based in Tel Aviv, is responding with an open platform for visual generative AI that emphasizes model transparency alongside fair attribution and copyright protections. Currently offering models that convert
Read Article
( 6
min )
With the 2018 launch of RTX technologies and the first consumer GPU built for AI — GeForce RTX — NVIDIA accelerated the shift to AI computing. Since then, AI on RTX PCs and workstations has grown into a thriving ecosystem with more than 100 million users and 500 AI applications.
( 10
min )
arXiv:2306.08175v2 Announce Type: replace-cross
Abstract: Conformer-based end-to-end models have become ubiquitous these days and are commonly used in both streaming and non-streaming automatic speech recognition (ASR). Techniques like dual-mode and dynamic chunk training helped unify streaming and non-streaming systems. However, there remains a performance gap between streaming with a full and limited past context. To address this issue, we propose the integration of a novel dynamic contextual carry-over mechanism in a state-of-the-art (SOTA) unified ASR system. Our proposed dynamic context Conformer (DCTX-Conformer) utilizes a non-overlapping contextual carry-over mechanism that takes into account both the left context of a chunk and one or more preceding context embeddings. We outperform the SOTA by a relative 25.0% word error rate, with a negligible latency impact due to the additional context embeddings.
( 2
min )
arXiv:2212.04672v3 Announce Type: replace-cross
Abstract: Nonconvex minimax problems have attracted wide attention in machine learning, signal processing and many other fields in recent years. In this paper, we propose a primal-dual alternating proximal gradient (PDAPG) algorithm and a primal-dual proximal gradient (PDPG-L) algorithm for solving nonsmooth nonconvex-(strongly) concave and nonconvex-linear minimax problems with coupled linear constraints, respectively. The iteration complexity of the two algorithms are proved to be $\mathcal{O}\left( \varepsilon ^{-2} \right)$ (resp. $\mathcal{O}\left( \varepsilon ^{-4} \right)$) under nonconvex-strongly concave (resp. nonconvex-concave) setting and $\mathcal{O}\left( \varepsilon ^{-3} \right)$ under nonconvex-linear setting to reach an $\varepsilon$-stationary point, respectively. To our knowledge, they are the first two algorithms with iteration complexity guarantees for solving the nonconvex minimax problems with coupled linear constraints.
( 2
min )
arXiv:2401.11667v2 Announce Type: replace
Abstract: This paper introduces INCPrompt, an innovative continual learning solution that effectively addresses catastrophic forgetting. INCPrompt's key innovation lies in its use of adaptive key-learner and task-aware prompts that capture task-relevant information. This unique combination encapsulates general knowledge across tasks and encodes task-specific knowledge. Our comprehensive evaluation across multiple continual learning benchmarks demonstrates INCPrompt's superiority over existing algorithms, showing its effectiveness in mitigating catastrophic forgetting while maintaining high performance. These results highlight the significant impact of task-aware incremental prompting on continual learning performance.
( 2
min )
arXiv:2105.13937v3 Announce Type: replace
Abstract: We present a new class of Langevin based algorithms, which overcomes many of the known shortcomings of popular adaptive optimizers that are currently used for the fine tuning of deep learning models. Its underpinning theory relies on recent advances of Euler's polygonal approximations for stochastic differential equations (SDEs) with monotone coefficients. As a result, it inherits the stability properties of tamed algorithms, while it addresses other known issues, e.g. vanishing gradients in neural networks. In particular, we provide a nonasymptotic analysis and full theoretical guarantees for the convergence properties of an algorithm of this novel class, which we named TH$\varepsilon$O POULA (or, simply, TheoPouLa). Finally, several experiments are presented with different types of deep learning models, which show the superior performance of TheoPouLa over many popular adaptive optimization algorithms.
( 2
min )
arXiv:2403.02080v1 Announce Type: cross
Abstract: In this paper, we investigate the performance of a Hybrid Quantum Neural Network (HQNN) and a comparable classical Convolution Neural Network (CNN) for detection and classification problem using a radar. Specifically, we take a fairly complex radar time-series model derived from electromagnetic theory, namely the Martin-Mulgrew model, that is used to simulate radar returns of objects with rotating blades, such as drones. We find that when that signal-to-noise ratio (SNR) is high, CNN outperforms the HQNN for detection and classification. However, in the low SNR regime (which is of greatest interest in practice) the performance of HQNN is found to be superior to that of the CNN of a similar architecture.
( 2
min )
arXiv:2403.01805v1 Announce Type: cross
Abstract: Shannon entropy regularization is widely adopted in optimal control due to its ability to promote exploration and enhance robustness, e.g., maximum entropy reinforcement learning known as Soft Actor-Critic. In this paper, Tsallis entropy, which is a one-parameter extension of Shannon entropy, is used for the regularization of linearly solvable MDP and linear quadratic regulators. We derive the solution for these problems and demonstrate its usefulness in balancing between exploration and sparsity of the obtained control law.
( 2
min )
arXiv:2403.01673v1 Announce Type: cross
Abstract: For Multivariate Time Series Forecasting (MTSF), recent deep learning applications show that univariate models frequently outperform multivariate ones. To address the difficiency in multivariate models, we introduce a method to Construct Auxiliary Time Series (CATS) that functions like a 2D temporal-contextual attention mechanism, which generates Auxiliary Time Series (ATS) from Original Time Series (OTS) to effectively represent and incorporate inter-series relationships for forecasting. Key principles of ATS - continuity, sparsity, and variability - are identified and implemented through different modules. Even with a basic 2-layer MLP as core predictor, CATS achieves state-of-the-art, significantly reducing complexity and parameters compared to previous multivariate models, marking it an efficient and transferable MTSF solution.
( 2
min )
arXiv:2403.01318v1 Announce Type: cross
Abstract: Motivated by the empirical power law of the distributions of credits (e.g., the number of "likes") of viral posts in social media, we introduce the high-dimensional tail index regression and methods of estimation and inference for its parameters. We propose a regularized estimator, establish its consistency, and derive its convergence rate. To conduct inference, we propose to debias the regularized estimate, and establish the asymptotic normality of the debiased estimator. Simulation studies support our theory. These methods are applied to text analyses of viral posts in X (formerly Twitter) concerning LGBTQ+.
( 2
min )
arXiv:2403.01299v1 Announce Type: cross
Abstract: Physically unclonable functions (PUFs) identify integrated circuits using nonlinearly-related challenge-response pairs (CRPs). Ideally, the relationship between challenges and corresponding responses is unpredictable, even if a subset of CRPs is known. Previous work developed a photonic PUF offering improved security compared to non-optical counterparts. Here, we investigate this PUF's susceptibility to Multiple-Valued-Logic-based machine learning attacks. We find that approximately 1,000 CRPs are necessary to train models that predict response bits better than random chance. Given the significant challenge of acquiring a vast number of CRPs from a photonic PUF, our results demonstrate photonic PUF resilience against such attacks.
( 2
min )
arXiv:2403.01076v1 Announce Type: cross
Abstract: OOD detection has become more pertinent with advances in network design and increased task complexity. Identifying which parts of the data a given network is misclassifying has become as valuable as the network's overall performance. We can compress the model with quantization, but it suffers minor performance loss. The loss of performance further necessitates the need to derive the confidence estimate of the network's predictions. In line with this thinking, we introduce an Uncertainty Quantification(UQ) technique to quantify the uncertainty in the predictions from a pre-trained vision model. We subsequently leverage this information to extract valuable predictions while ignoring the non-confident predictions. We observe that our technique saves up to 80% of ignored samples from being misclassified. The code for the same is available here.
( 2
min )
arXiv:2403.00964v1 Announce Type: cross
Abstract: In Natural Language Generation (NLG), contemporary Large Language Models (LLMs) face several challenges, such as generating fluent yet inaccurate outputs and reliance on fluency-centric metrics. This often leads to neural networks exhibiting "hallucinations". The SHROOM challenge focuses on automatically identifying these hallucinations in the generated text. To tackle these issues, we introduce two key components, a data augmentation pipeline incorporating LLM-assisted pseudo-labelling and sentence rephrasing, and a voting ensemble from three models pre-trained on Natural Language Inference (NLI) tasks and fine-tuned on diverse datasets.
( 2
min )
arXiv:2403.00788v1 Announce Type: cross
Abstract: This study introduces and evaluates the PRECISE framework, utilizing OpenAI's GPT-4 to enhance patient engagement by providing clearer and more accessible chest X-ray reports at a sixth-grade reading level. The framework was tested on 500 reports, demonstrating significant improvements in readability, reliability, and understandability. Statistical analyses confirmed the effectiveness of the PRECISE approach, highlighting its potential to foster patient-centric care delivery in healthcare decision-making.
( 2
min )
arXiv:2403.00766v1 Announce Type: cross
Abstract: This paper addresses the critical challenge of managing Quality of Service (QoS) in cloud services, focusing on the nuances of individual tenant expectations and varying Service Level Indicators (SLIs). It introduces a novel approach utilizing Deep Reinforcement Learning for tenant-specific QoS management in multi-tenant, multi-accelerator cloud environments. The chosen SLI, deadline hit rate, allows clients to tailor QoS for each service request. A novel online scheduling algorithm for Deep Neural Networks in multi-accelerator systems is proposed, with a focus on guaranteeing tenant-wise, model-specific QoS levels while considering real-time constraints.
( 2
min )
arXiv:2403.01922v1 Announce Type: new
Abstract: In industrial and environmental monitoring, achieving real-time and precise fluid flow measurement remains a critical challenge. This study applies linear quantization in FPGA-based soft sensors for fluid flow estimation, significantly enhancing Neural Network model precision by overcoming the limitations of traditional fixed-point quantization. Our approach achieves up to a 10.10% reduction in Mean Squared Error and a notable 9.39% improvement in inference speed through targeted hardware optimizations. Validated across multiple data sets, our findings demonstrate that the optimized FPGA-based quantized models can provide efficient, accurate real-time inference, offering a viable alternative to cloud-based processing in pervasive autonomous systems.
( 2
min )
arXiv:2403.01718v1 Announce Type: new
Abstract: We examined the use of the Ising model as an L0 regularization method for field-aware factorization machines (FFM). This approach improves generalization performance and has the advantage of simultaneously determining the best feature combinations for each of several groups. We can deepen the interpretation and understanding of the model from the similarities and differences in the features selected in each group.
( 2
min )
arXiv:2403.01628v1 Announce Type: new
Abstract: The third ML4H symposium was held in person on December 10, 2023, in New Orleans, Louisiana, USA. The symposium included research roundtable sessions to foster discussions between participants and senior researchers on timely and relevant topics for the \ac{ML4H} community. Encouraged by the successful virtual roundtables in the previous year, we organized eleven in-person roundtables and four virtual roundtables at ML4H 2022. The organization of the research roundtables at the conference involved 17 Senior Chairs and 19 Junior Chairs across 11 tables. Each roundtable session included invited senior chairs (with substantial experience in the field), junior chairs (responsible for facilitating the discussion), and attendees from diverse backgrounds with interest in the session's topic. Herein we detail the organization process and compile takeaways from these roundtable discussions, including recent advances, applications, and open challenges for each topic. We conclude with a summary and lessons learned across all roundtables. This document serves as a comprehensive review paper, summarizing the recent advancements in machine learning for healthcare as contributed by foremost researchers in the field.
( 3
min )
arXiv:2403.01348v1 Announce Type: new
Abstract: Indoor localization is a critical task in many embedded applications, such as asset tracking, emergency response, and realtime navigation. In this article, we propose a novel fingerprintingbased framework for indoor localization called SANGRIA that uses stacked autoencoder neural networks with gradient boosted trees. Our approach is designed to overcome the device heterogeneity challenge that can create uncertainty in wireless signal measurements across embedded devices used for localization. We compare SANGRIA to several state-of-the-art frameworks and demonstrate 42.96% lower average localization error across diverse indoor locales and heterogeneous devices.
( 2
min )
arXiv:2403.01046v1 Announce Type: new
Abstract: We prove that training neural networks on 1-D data is equivalent to solving a convex Lasso problem with a fixed, explicitly defined dictionary matrix of features. The specific dictionary depends on the activation and depth. We consider 2-layer networks with piecewise linear activations, deep narrow ReLU networks with up to 4 layers, and rectangular and tree networks with sign activation and arbitrary depth. Interestingly in ReLU networks, a fourth layer creates features that represent reflections of training data about themselves. The Lasso representation sheds insight to globally optimal networks and the solution landscape.
( 2
min )
arXiv:2311.06108v2 Announce Type: replace-cross
Abstract: The consistency of the maximum likelihood estimator for mixtures of elliptically-symmetric distributions for estimating its population version is shown, where the underlying distribution $P$ is nonparametric and does not necessarily belong to the class of mixtures on which the estimator is based. In a situation where $P$ is a mixture of well enough separated but nonparametric distributions it is shown that the components of the population version of the estimator correspond to the well separated components of $P$. This provides some theoretical justification for the use of such estimators for cluster analysis in case that $P$ has well separated subpopulations even if these subpopulations differ from what the mixture model assumes.
( 2
min )
arXiv:2212.04672v3 Announce Type: replace-cross
Abstract: Nonconvex minimax problems have attracted wide attention in machine learning, signal processing and many other fields in recent years. In this paper, we propose a primal-dual alternating proximal gradient (PDAPG) algorithm and a primal-dual proximal gradient (PDPG-L) algorithm for solving nonsmooth nonconvex-(strongly) concave and nonconvex-linear minimax problems with coupled linear constraints, respectively. The iteration complexity of the two algorithms are proved to be $\mathcal{O}\left( \varepsilon ^{-2} \right)$ (resp. $\mathcal{O}\left( \varepsilon ^{-4} \right)$) under nonconvex-strongly concave (resp. nonconvex-concave) setting and $\mathcal{O}\left( \varepsilon ^{-3} \right)$ under nonconvex-linear setting to reach an $\varepsilon$-stationary point, respectively. To our knowledge, they are the first two algorithms with iteration complexity guarantees for solving the nonconvex minimax problems with coupled linear constraints.
( 2
min )
arXiv:2105.13937v3 Announce Type: replace-cross
Abstract: We present a new class of Langevin based algorithms, which overcomes many of the known shortcomings of popular adaptive optimizers that are currently used for the fine tuning of deep learning models. Its underpinning theory relies on recent advances of Euler's polygonal approximations for stochastic differential equations (SDEs) with monotone coefficients. As a result, it inherits the stability properties of tamed algorithms, while it addresses other known issues, e.g. vanishing gradients in neural networks. In particular, we provide a nonasymptotic analysis and full theoretical guarantees for the convergence properties of an algorithm of this novel class, which we named TH$\varepsilon$O POULA (or, simply, TheoPouLa). Finally, several experiments are presented with different types of deep learning models, which show the superior performance of TheoPouLa over many popular adaptive optimization algorithms.
( 2
min )
arXiv:2403.01046v1 Announce Type: cross
Abstract: We prove that training neural networks on 1-D data is equivalent to solving a convex Lasso problem with a fixed, explicitly defined dictionary matrix of features. The specific dictionary depends on the activation and depth. We consider 2-layer networks with piecewise linear activations, deep narrow ReLU networks with up to 4 layers, and rectangular and tree networks with sign activation and arbitrary depth. Interestingly in ReLU networks, a fourth layer creates features that represent reflections of training data about themselves. The Lasso representation sheds insight to globally optimal networks and the solution landscape.
( 2
min )
arXiv:2403.01673v1 Announce Type: new
Abstract: For Multivariate Time Series Forecasting (MTSF), recent deep learning applications show that univariate models frequently outperform multivariate ones. To address the difficiency in multivariate models, we introduce a method to Construct Auxiliary Time Series (CATS) that functions like a 2D temporal-contextual attention mechanism, which generates Auxiliary Time Series (ATS) from Original Time Series (OTS) to effectively represent and incorporate inter-series relationships for forecasting. Key principles of ATS - continuity, sparsity, and variability - are identified and implemented through different modules. Even with a basic 2-layer MLP as core predictor, CATS achieves state-of-the-art, significantly reducing complexity and parameters compared to previous multivariate models, marking it an efficient and transferable MTSF solution.
( 2
min )
arXiv:2403.01318v1 Announce Type: new
Abstract: Motivated by the empirical power law of the distributions of credits (e.g., the number of "likes") of viral posts in social media, we introduce the high-dimensional tail index regression and methods of estimation and inference for its parameters. We propose a regularized estimator, establish its consistency, and derive its convergence rate. To conduct inference, we propose to debias the regularized estimate, and establish the asymptotic normality of the debiased estimator. Simulation studies support our theory. These methods are applied to text analyses of viral posts in X (formerly Twitter) concerning LGBTQ+.
( 2
min )
Image source https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/ Current LLM applications are mostly based on LangChain or LlamaIndex. LangChain and LlamaIndex are frameworks designed for LLM development. They each cater to different use cases with unique features. LangChain is a framework ideal for creating data-aware and agent-based applications. It offers high-level APIs for easy integration with various large language model (LLM)… Read More »Future of LLM application development – impact of Gemini 1.5 Pro with a 1M context window
The post Future of LLM application development – impact of Gemini 1.5 Pro with a 1M context window appeared first on Data Science Central.
( 21
min )
The 96th Academy Awards nominees for Best Visual Effects are a testament to the incredible technological advancements pushing the boundaries of what’s possible in film. Whether showcasing colossal destruction scenes, heart-pumping action sequences or interstellar adventures, each nominee demonstrates unique contributions in visual effects, or VFX — and they all used cutting-edge NVIDIA technologies in
Read Article
( 8
min )
The Complete Guide to Understanding Large-Language Models and How to Work with Them.
Continue reading on Becoming Human: Artificial Intelligence Magazine »
( 5
min )
Microsoft’s Orca-Math, a specialized small language model, outperforms much larger models in solving math problems that require multi-step reasoning and shows the potential of using feedback to improve language models. Learn more.
The post Orca-Math: Demonstrating the potential of SLMs with model specialization appeared first on Microsoft Research.
( 10
min )
MIT spinout DataCebo helps companies bolster their datasets by creating synthetic data that mimic the real thing.
( 6
min )
arXiv:2303.16548v2 Announce Type: replace-cross
Abstract: This paper studies an infinite horizon optimal control problem for discrete-time linear system and quadratic criteria, both with random parameters which are independent and identically distributed with respect to time. In this general setting, we apply the policy gradient method, a reinforcement learning technique, to search for the optimal control without requiring knowledge of statistical information of the parameters. We investigate the sub-Gaussianity of the state process and establish global linear convergence guarantee for this approach based on assumptions that are weaker and easier to verify compared to existing results. Numerical experiments are presented to illustrate our result.
( 2
min )
arXiv:2401.13662v2 Announce Type: replace
Abstract: In recent years, various powerful policy gradient algorithms have been proposed in deep reinforcement learning. While all these algorithms build on the Policy Gradient Theorem, the specific design choices differ significantly across algorithms. We provide a holistic overview of on-policy policy gradient algorithms to facilitate the understanding of both their theoretical foundations and their practical implementations. In this overview, we include a detailed proof of the continuous version of the Policy Gradient Theorem, convergence results and a comprehensive discussion of practical algorithms. We compare the most prominent algorithms on continuous control environments and provide insights on the benefits of regularization. All code is available at https://github.com/Matt00n/PolicyGradientsJax.
( 2
min )
arXiv:2403.00746v1 Announce Type: cross
Abstract: We develop a novel deep learning approach for pricing European options in diffusion models, that can efficiently handle high-dimensional problems resulting from Markovian approximations of rough volatility models. The option pricing partial differential equation is reformulated as an energy minimization problem, which is approximated in a time-stepping fashion by deep artificial neural networks. The proposed scheme respects the asymptotic behavior of option prices for large levels of moneyness, and adheres to a priori known bounds for option prices. The accuracy and efficiency of the proposed method is assessed in a series of numerical examples, with particular focus in the lifted Heston model.
( 2
min )
arXiv:2403.00257v1 Announce Type: cross
Abstract: Pulmonary emphysema, the progressive, irreversible loss of lung tissue, is conventionally categorized into three subtypes identifiable on pathology and on lung computed tomography (CT) images. Recent work has led to the unsupervised learning of ten spatially-informed lung texture patterns (sLTPs) on lung CT, representing distinct patterns of emphysematous lung parenchyma based on both textural appearance and spatial location within the lung, and which aggregate into 6 robust and reproducible CT Emphysema Subtypes (CTES). Existing methods for sLTP segmentation, however, are slow and highly sensitive to changes in CT acquisition protocol. In this work, we present a robust 3-D squeeze-and-excitation CNN for supervised classification of sLTPs and CTES on lung CT. Our results demonstrate that this model achieves accurate and reproducible sLTP segmentation on lung CTscans, across two independent cohorts and independently of scanner manufacturer and model.
( 2
min )
arXiv:2403.00236v1 Announce Type: cross
Abstract: We investigate the performance of LLM-based zero-shot stance detection on tweets. Using FlanT5-XXL, an instruction-tuned open-source LLM, with the SemEval 2016 Tasks 6A, 6B, and P-Stance datasets, we study the performance and its variations under different prompts and decoding strategies, as well as the potential biases of the model. We show that the zero-shot approach can match or outperform state-of-the-art benchmarks, including fine-tuned models. We provide various insights into its performance including the sensitivity to instructions and prompts, the decoding strategies, the perplexity of the prompts, and to negations and oppositions present in prompts. Finally, we ensure that the LLM has not been trained on test datasets, and identify a positivity bias which may partially explain the performance differences across decoding strategie
( 2
min )
arXiv:2403.00212v1 Announce Type: cross
Abstract: This research addresses the challenge of training an ASR model for personalized voices with minimal data. Utilizing just 14 minutes of custom audio from a YouTube video, we employ Retrieval-Based Voice Conversion (RVC) to create a custom Common Voice 16.0 corpus. Subsequently, a Cross-lingual Self-supervised Representations (XLSR) Wav2Vec2 model is fine-tuned on this dataset. The developed web-based GUI efficiently transcribes and translates input Hindi videos. By integrating XLSR Wav2Vec2 and mBART, the system aligns the translated text with the video timeline, delivering an accessible solution for multilingual video content transcription and translation for personalized voice.
( 2
min )
arXiv:2403.00196v1 Announce Type: cross
Abstract: Advanced Driver Assistance Systems (ADAS) in intelligent vehicles rely on accurate driver perception within the vehicle cabin, often leveraging a combination of sensing modalities. However, these modalities operate at varying rates, posing challenges for real-time, comprehensive driver state monitoring. This paper addresses the issue of missing data due to sensor frame rate mismatches, introducing a generative model approach to create synthetic yet realistic thermal imagery. We propose using conditional generative adversarial networks (cGANs), specifically comparing the pix2pix and CycleGAN architectures. Experimental results demonstrate that pix2pix outperforms CycleGAN, and utilizing multi-view input styles, especially stacked views, enhances the accuracy of thermal image generation. Moreover, the study evaluates the model's generalizability across different subjects, revealing the importance of individualized training for optimal performance. The findings suggest the potential of generative models in addressing missing frames, advancing driver state monitoring for intelligent vehicles, and underscoring the need for continued research in model generalization and customization.
( 2
min )
arXiv:2403.00666v1 Announce Type: cross
Abstract: We obtain sharp upper and lower bounds for the expected max-sliced 1-Wasserstein distance between a probability measure on a separable Hilbert space and its empirical distribution from $n$ samples. A version of this result for probability measures on Banach spaces is also obtained.
( 2
min )
Image by Gerd Altmann from Pixabay Assisted annotation and other automation methods that can be used for knowledge graph creation and natural language understanding should be more flexible and iterative than they often are. Much depends on the architecture and tooling choices your organization makes. If you make sound choices, you avail yourself of a… Read More »Scaling understanding with the help of feedback loops, knowledge graphs and NLP
The post Scaling understanding with the help of feedback loops, knowledge graphs and NLP appeared first on Data Science Central.
( 22
min )
This post is co-written with Sherwin Chu from Alida. Alida helps the world’s biggest brands create highly engaged research communities to gather feedback that fuels better customer experiences and product innovation. Alida’s customers receive tens of thousands of engaged responses for a single survey, therefore the Alida team opted to leverage machine learning (ML) to […]
( 8
min )
Amazon Bedrock is the best place to build and scale generative AI applications with large language models (LLM) and other foundation models (FMs). It enables customers to leverage a variety of high-performing FMs, such as the Claude family of models by Anthropic, to build custom generative AI applications. Looking back to 2021, when Anthropic first started […]
( 8
min )
Bringing together pioneers in robotics and AI, NVIDIA GTC will be a state-of-the-art showcase of applied AI for autonomous machines. The conference, running March 18-21 at the San Jose Convention Center and online, boasts a star-studded lineup. This includes a fireside chat with Marc Raibert, executive director of The AI Institute, and Dieter Fox, senior
Read Article
( 6
min )
arXiv:2402.19296v1 Announce Type: cross
Abstract: Gastric and oesophageal (OG) cancers are the leading causes of cancer mortality worldwide. In OG cancers, recent studies have showed that PDL1 immune checkpoint inhibitors (ICI) in combination with chemotherapy improves patient survival. However, our understanding of the tumour immune microenvironment in OG cancers remains limited. In this study, we interrogate multiplex immunofluorescence (mIF) images taken from patients with advanced Oesophagogastric Adenocarcinoma (OGA) who received first-line fluoropyrimidine and platinum-based chemotherapy in the PLATFORM trial (NCT02678182) to predict the efficacy of the treatment and to explore the biological basis of patients responding to maintenance durvalumab (PDL1 inhibitor). Our proposed Artificial Intelligence (AI) based marker successfully identified responder from non-responder (p < 0.05) as well as those who could potentially benefit from ICI with statistical significance (p < 0.05) for both progression free and overall survival. Our findings suggest that T cells that express FOXP3 seem to heavily influence the patient treatment response and survival outcome. We also observed that higher levels of CD8+PD1+ cells are consistently linked to poor prognosis for both OS and PFS, regardless of ICI.
( 2
min )
arXiv:2402.19355v1 Announce Type: cross
Abstract: Adversarial examples have proven to threaten speaker identification systems, and several countermeasures against them have been proposed. In this paper, we propose a method to detect the presence of adversarial examples, i.e., a binary classifier distinguishing between benign and adversarial examples. We build upon and extend previous work on attack type classification by exploring new architectures. Additionally, we introduce a method for identifying the victim model on which the adversarial attack is carried out. To achieve this, we generate a new dataset containing multiple attacks performed against various victim models. We achieve an AUC of 0.982 for attack detection, with no more than a 0.03 drop in performance for unknown attacks. Our attack classification accuracy (excluding benign) reaches 86.48% across eight attack types using our LightResNet34 architecture, while our victim model classification accuracy reaches 72.28% across four victim models.
( 2
min )
arXiv:2402.19226v1 Announce Type: new
Abstract: This study investigates gender fairness in personalized pain care recommendations using machine learning algorithms. Leveraging a contextual bandits framework, personalized recommendations are formulated and evaluated using LinUCB algorithm on a dataset comprising interactions with $164$ patients across $10$ sessions each. Results indicate that while adjustments to algorithm parameters influence the quality of pain care recommendations, this impact remains consistent across genders. However, when certain patient information, such as self-reported pain measurements, is absent, the quality of pain care recommendations for women is notably inferior to that for men.
( 2
min )
arXiv:2402.18836v1 Announce Type: new
Abstract: This paper investigates how to incorporate expert observations (without explicit information on expert actions) into a deep reinforcement learning setting to improve sample efficiency. First, we formulate an augmented policy loss combining a maximum entropy reinforcement learning objective with a behavioral cloning loss that leverages a forward dynamics model. Then, we propose an algorithm that automatically adjusts the weights of each component in the augmented loss function. Experiments on a variety of continuous control tasks demonstrate that the proposed algorithm outperforms various benchmarks by effectively utilizing available expert observations.
( 2
min )
arXiv:2402.18803v1 Announce Type: new
Abstract: In fair machine learning, one source of performance disparities between groups is over-fitting to groups with relatively few training samples. We derive group-specific bounds on the generalization error of welfare-centric fair machine learning that benefit from the larger sample size of the majority group. We do this by considering group-specific Rademacher averages over a restricted hypothesis class, which contains the family of models likely to perform well with respect to a fair learning objective (e.g., a power-mean). Our simulations demonstrate these bounds improve over a naive method, as expected by theory, with particularly significant improvement for smaller group sizes.
( 2
min )
At AWS re:Invent 2023, we announced the general availability of Knowledge Bases for Amazon Bedrock. With a knowledge base, you can securely connect foundation models (FMs) in Amazon Bedrock to your company data for fully managed Retrieval Augmented Generation (RAG). In a previous post, we described how Knowledge Bases for Amazon Bedrock manages the end-to-end […]
( 12
min )
The rise of artificial intelligence (AI) has created opportunities to improve the customer experience in the contact center space. Machine learning (ML) technologies continually improve and power the contact center customer experience by providing solutions for capabilities like self-service bots, live call analytics, and post-call analytics. Self-service bots integrated with your call center can help […]
( 10
min )
The Geneva International Motor Show, one of the most important and long-standing global auto exhibitions, opened this week, with the spotlight on several China and U.S. EV makers building on NVIDIA DRIVE that are expanding their presence in Europe. BYD One of the key reveals is BYD’s Yangweng U8 plug-in hybrid large SUV, built on
Read Article
( 5
min )
For some NVIDIANs, it’s always game day. Our Santa Clara-based software quality assurance team boasts some of the world’s top gamers, whose search for bugs and errors is as strategic as their battle plans for toppling top-tier opponents in video games. Two team members of the QA team — friendly colleagues in the office but
Read Article
( 6
min )
Enterprise execs across broad sectors to share their AI strategies and success stories.
( 7
min )
The dynamic landscape of customer-centric businesses requires understanding and improving customer satisfaction. Traditional survey analysis rarely yields real-time actionable insights. However, machine learning (ML) predictive analysis allows organizations to use advanced algorithms to transform customer satisfaction surveys. ML predictive analysis is changing how businesses measure and improve customer satisfaction. Customer feedback analysis is where untapped… Read More »Leveraging machine learning for predictive analysis in customer satisfaction surveys
The post Leveraging machine learning for predictive analysis in customer satisfaction surveys appeared first on Data Science Central.
( 21
min )
I’ve been interested in self-sovereign identity for a number of years now, ever since I interviewed Phil Windley, a founder of the Internet Identity Workshop (IIW) and then chair of the Sovrin Foundation, in 2018. In a self-sovereign identity (SSI) scenario, the users themselves control the sensitive information previously stored by a third party. Take… Read More »Mobile drivers’ licenses: A humbler take on self-sovereign identity and personal data protection
The post Mobile drivers’ licenses: A humbler take on self-sovereign identity and personal data protection appeared first on Data Science Central.
( 22
min )
Lightmatter, founded by three MIT alumni, is using photonic technologies to reinvent how chips communicate and calculate.
( 6
min )
Tamara Broderick uses statistical approaches to understand and quantify the uncertainty that can affect study results.
( 7
min )
arXiv:2310.17491v2 Announce Type: replace
Abstract: The emergence of foundation models, including language and vision models, has reshaped AI's landscape, offering capabilities across various applications. Deploying and fine-tuning these large models, like GPT-3 and BERT, presents challenges, especially in the current foundation model era. We introduce Emulator-Assisted Tuning (EAT) combined with Parameter-Efficient Fine-Tuning (PEFT) to form Parameter-Efficient Emulator-Assisted Tuning (PEAT). Further, we expand this into federated learning as Federated PEAT (FedPEAT). FedPEAT uses adapters, emulators, and PEFT for federated model tuning, enhancing model privacy and memory efficiency. Adapters adjust pre-trained models, while emulators give a compact representation of original models, addressing both privacy and efficiency. Adaptable to various neural networks, our approach also uses deep reinforcement learning for hyper-parameter optimization. We tested FedPEAT in a unique scenario with a server participating in collaborative federated tuning, showcasing its potential in tackling foundation model challenges.
( 2
min )
arXiv:2402.18046v1 Announce Type: new
Abstract: We present a novel data augmentation method to address the challenge of data scarcity in modeling longitudinal patterns in Electronic Health Records (EHR) of patients using natural language processing (NLP) algorithms. The proposed method generates augmented data by rearranging the orders of medical records within a visit where the order of elements are not obvious, if any. Applying the proposed method to the clopidogrel treatment failure detection task enabled up to 5.3% absolute improvement in terms of ROC-AUC (from 0.908 without augmentation to 0.961 with augmentation) when it was used during the pre-training procedure. It was also shown that the augmentation helped to improve performance during fine-tuning procedures, especially when the amount of labeled training data is limited.
( 2
min )
MIT.nano Immersion Lab works with AR/VR startup to create transcontinental medical instruction.
( 6
min )
Molecular geometry modeling is a powerful tool for understanding the intricate relationships between molecular structure and biological activity – a field known as structure-activity relationships (SAR). The main premise of SAR is that the biological activity of a molecule is dictated by its specific chemical structure, not only the connections between nuclei but also how […]
The post ViSNet: A general molecular geometry modeling framework for predicting molecular properties and simulating molecular dynamics appeared first on Microsoft Research.
( 13
min )
This book is for participants in my AI and machine learning certification program. However, it is now free and available to everyone. With tutorials, enterprise-grade projects and solutions, it covers state-of-the-art material on topics such as generative adversarial networks (GAN), specialized LLM, data synthetization, as well as classical machine learning. It is a work in… Read More »New Book: Statistical Optimization for GenAI and Machine Learning
The post New Book: Statistical Optimization for GenAI and Machine Learning appeared first on Data Science Central.
( 21
min )
Amazon Bedrock provides a broad range of models from Amazon and third-party providers, including Anthropic, AI21, Meta, Cohere, and Stability AI, and covers a wide range of use cases, including text and image generation, embedding, chat, high-level agents with reasoning and orchestration, and more. Knowledge Bases for Amazon Bedrock allows you to build performant and […]
( 14
min )
OpenSearch is a scalable, flexible, and extensible open source software suite for search, analytics, security monitoring, and observability applications, licensed under the Apache 2.0 license. Amazon OpenSearch Service is a fully managed service that makes it straightforward to deploy, scale, and operate OpenSearch in the AWS Cloud. OpenSearch uses a probabilistic ranking framework called BM-25 […]
( 12
min )
Creating scalable and efficient machine learning (ML) pipelines is crucial for streamlining the development, deployment, and management of ML models. In this post, we present a framework for automating the creation of a directed acyclic graph (DAG) for Amazon SageMaker Pipelines based on simple configuration files. The framework code and examples presented here only cover […]
( 13
min )
This guest post is written by Vihan Lakshman, Tharun Medini, and Anshumali Shrivastava from ThirdAI. Large-scale deep learning has recently produced revolutionary advances in a vast array of fields. Although this stunning progress in artificial intelligence remains remarkable, the financial costs and energy consumption required to train these models has emerged as a critical bottleneck […]
( 9
min )
AI’s growing influence in large organizations brings crucial challenges in managing AI platforms. These include developing a scalable and operationally efficient platform that adheres to organizational compliance and security standards. Amazon SageMaker Studio offers a comprehensive set of capabilities for machine learning (ML) practitioners and data scientists. These include a fully managed AI development environment […]
( 12
min )
GFN Thursday celebrates this leap day with the addition of a popular game store to the cloud. Stream the first titles from Blizzard Entertainment’s Battle.net, including Diablo IV, Overwatch 2, Call of Duty HQ and Hearthstone, now playable across more devices than ever. They’re all part of the 30 new games coming to GeForce NOW
Read Article
( 6
min )
arXiv:2402.17215v1 Announce Type: cross
Abstract: This note considers the multidimensional unstructured sparse recovery problems. Examples include Fourier inversion and sparse deconvolution. The eigenmatrix is a data-driven construction with desired approximate eigenvalues and eigenvectors proposed for the one-dimensional problems. This note extends the eigenmatrix approach to multidimensional problems. Numerical results are provided to demonstrate the performance of the proposed method.
( 2
min )
arXiv:2402.17698v1 Announce Type: cross
Abstract: In this work, we address the challenge of efficiently modeling dynamical systems in process engineering. We use reduced-order model learning, specifically operator inference. This is a non-intrusive, data-driven method for learning dynamical systems from time-domain data. The application in our study is carbon dioxide methanation, an important reaction within the Power-to-X framework, to demonstrate its potential. The numerical results show the ability of the reduced-order models constructed with operator inference to provide a reduced yet accurate surrogate solution. This represents an important milestone towards the implementation of fast and reliable digital twin architectures.
( 2
min )
arXiv:2402.17317v1 Announce Type: cross
Abstract: Deep Learning is the state-of-the-art technology for segmenting brain tumours. However, this requires a lot of high-quality data, which is difficult to obtain, especially in the medical field. Therefore, our solutions address this problem by using unconventional mechanisms for data augmentation. Generative adversarial networks and registration are used to massively increase the amount of available samples for training three different deep learning models for brain tumour segmentation, the first task of the BraTS2023 challenge. The first model is the standard nnU-Net, the second is the Swin UNETR and the third is the winning solution of the BraTS 2021 Challenge. The entire pipeline is built on the nnU-Net implementation, except for the generation of the synthetic data. The use of convolutional algorithms and transformers is able to fill each other's knowledge gaps. Using the new metric, our best solution achieves the dice results 0.9005, 0.8673, 0.8509 and HD95 14.940, 14.467, 17.699 (whole tumour, tumour core and enhancing tumour) in the validation set.
( 3
min )
arXiv:2402.17249v1 Announce Type: cross
Abstract: The ever-evolving ways attacker continues to im prove their phishing techniques to bypass existing state-of-the-art phishing detection methods pose a mountain of challenges to researchers in both industry and academia research due to the inability of current approaches to detect complex phishing attack. Thus, current anti-phishing methods remain vulnerable to complex phishing because of the increasingly sophistication tactics adopted by attacker coupled with the rate at which new tactics are being developed to evade detection. In this research, we proposed an adaptable framework that combines Deep learning and Randon Forest to read images, synthesize speech from deep-fake videos, and natural language processing at various predictions layered to significantly increase the performance of machine learning models for phishing attack detection.
( 2
min )
arXiv:2402.17232v1 Announce Type: cross
Abstract: We propose a two-scale neural network method for solving partial differential equations (PDEs) with small parameters using physics-informed neural networks (PINNs). We directly incorporate the small parameters into the architecture of neural networks. The proposed method enables solving PDEs with small parameters in a simple fashion, without adding Fourier features or other computationally taxing searches of truncation parameters. Various numerical examples demonstrate reasonable accuracy in capturing features of large derivatives in the solutions caused by small parameters.
( 2
min )
arXiv:2402.17045v1 Announce Type: cross
Abstract: To secure computers and information systems from attackers taking advantage of vulnerabilities in the system to commit cybercrime, several methods have been proposed for real-time detection of vulnerabilities to improve security around information systems. Of all the proposed methods, machine learning had been the most effective method in securing a system with capabilities ranging from early detection of software vulnerabilities to real-time detection of ongoing compromise in a system. As there are different types of cyberattacks, each of the existing state-of-the-art machine learning models depends on different algorithms for training which also impact their suitability for detection of a particular type of cyberattack. In this research, we analyzed each of the current state-of-theart machine learning models for different types of cyberattack detection from the past 10 years with a major emphasis on the most recent works for comparative study to identify the knowledge gap where work is still needed to be done with regard to detection of each category of cyberattack
( 2
min )
arXiv:2402.17570v1 Announce Type: new
Abstract: Gaussian Processes (GP) have become popular machine learning methods for kernel based learning on datasets with complicated covariance structures. In this paper, we present a novel extension to the GP framework using a contaminated normal likelihood function to better account for heteroscedastic variance and outlier noise. We propose a scalable inference algorithm based on the Sparse Variational Gaussian Process (SVGP) method for fitting sparse Gaussian process regression models with contaminated normal noise on large datasets. We examine an application to geomagnetic ground perturbations, where the state-of-art prediction model is based on neural networks. We show that our approach yields shorter predictions intervals for similar coverage and accuracy when compared to an artificial dense neural network baseline.
( 2
min )
Daron Acemoglu, David Autor, and Simon Johnson, faculty co-directors of the new MIT Shaping the Future of Work Initiative, describe why the work matters and what they hope to achieve.
( 5
min )
These days, just about everyone is a content creator. But can generative AI help make people create high-quality films and other content affordably? Find out from Pinar Seyhan Demirdag, cofounder and CEO of Cuebric, during his conversation with NVIDIA AI Podcast host Noah Kravitz. Cuebric is on a mission to offer new solutions in filmmaking
Read Article
( 5
min )
YouTube content creator Ralph Panebianco really, really loves video games.
( 6
min )
Structured Query Language (SQL) is a complex language that requires an understanding of databases and metadata. Today, generative AI can enable people without SQL knowledge. This generative AI task is called text-to-SQL, which generates SQL queries from natural language processing (NLP) and converts text into semantically correct SQL. The solution in this post aims to […]
( 11
min )
arXiv:2307.02496v2 Announce Type: replace-cross
Abstract: Electrolysis is crucial for eco-friendly hydrogen production, but gas bubbles generated during the process hinder reactions, reduce cell efficiency, and increase energy consumption. Additionally, these gas bubbles cause changes in the conductivity inside the cell, resulting in corresponding variations in the induced magnetic field around the cell. Therefore, measuring these gas bubble-induced magnetic field fluctuations using external magnetic sensors and solving the inverse problem of Biot-Savart Law allows for estimating the conductivity in the cell and, thus, bubble size and location. However, determining high-resolution conductivity maps from only a few induced magnetic field measurements is an ill-posed inverse problem. To overcome this, we exploit Invertible Neural Networks (INNs) to reconstruct the conductivity field. Our qualitative results and quantitative evaluation using random error diffusion show that INN achieves far superior performance compared to Tikhonov regularization.
( 2
min )
arXiv:2402.15962v1 Announce Type: new
Abstract: Manufacturing energy consumption data contains important process signatures required for operational visibility and diagnostics. These signatures may be of different temporal scales, ranging from monthly to sub-second resolutions. We introduce a hierarchical machine learning approach to identify automotive process signatures from paint shop electricity consumption data at varying temporal scales (weekly and daily). A Multi-Layer Perceptron (MLP), a Convolutional Neural Network (CNN), and Principal Component Analysis (PCA) combined with Logistic Regression (LR) are used for the analysis. We validate the utility of the developed algorithms with subject matter experts for (i) better operational visibility, and (ii) identifying energy saving opportunities.
( 2
min )
arXiv:2401.11648v4 Announce Type: replace
Abstract: Predicting next visit diagnosis using Electronic Health Records (EHR) is an essential task in healthcare, critical for devising proactive future plans for both healthcare providers and patients. Nonetheless, many preceding studies have not sufficiently addressed the heterogeneous and hierarchical characteristics inherent in EHR data, inevitably leading to sub-optimal performance. To this end, we propose NECHO, a novel medical code-centric multimodal contrastive EHR learning framework with hierarchical regularisation. First, we integrate multifaceted information encompassing medical codes, demographics, and clinical notes using a tailored network design and a pair of bimodal contrastive losses, all of which pivot around a medical codes representation. We also regularise modality-specific encoders using a parental level information in medical ontology to learn hierarchical structure of EHR data. A series of experiments on MIMIC-III data demonstrates effectiveness of our approach.
( 2
min )
arXiv:2401.05653v2 Announce Type: replace
Abstract: This paper explores the application of Shapley Value Regression in dissecting marketing performance at channel-partner level, complementing channel-level Marketing Mix Modeling (MMM). Utilizing real-world data from the financial services industry, we demonstrate the practicality of Shapley Value Regression in evaluating individual partner contributions. Although structured in-field testing along with cooperative game theory is most accurate, it can often be highly complex and expensive to conduct. Shapley Value Regression is thus a more feasible approach to disentangle the influence of each marketing partner within a marketing channel. We also propose a simple method to derive adjusted coefficients of Shapley Value Regression and compares it with alternative approaches.
( 2
min )
arXiv:2402.16517v1 Announce Type: cross
Abstract: Finite element-based high-order solvers of conservation laws offer large accuracy but face challenges near discontinuities due to the Gibbs phenomenon. Artificial viscosity is a popular and effective solution to this problem based on physical insight. In this work, we present a physics-informed machine learning algorithm to automate the discovery of artificial viscosity models in a non-supervised paradigm. The algorithm is inspired by reinforcement learning and trains a neural network acting cell-by-cell (the viscosity model) by minimizing a loss defined as the difference with respect to a reference solution thanks to automatic differentiation. This enables a dataset-free training procedure. We prove that the algorithm is effective by integrating it into a state-of-the-art Runge-Kutta discontinuous Galerkin solver. We showcase several numerical tests on scalar and vectorial problems, such as Burgers' and Euler's equations in one and two dimensions. Results demonstrate that the proposed approach trains a model that is able to outperform classical viscosity models. Moreover, we show that the learnt artificial viscosity model is able to generalize across different problems and parameters.
( 2
min )
arXiv:2402.15650v1 Announce Type: new
Abstract: Safe reinforcement learning tasks with multiple constraints are a challenging domain despite being very common in the real world. To address this challenge, we propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic. We benchmark Objective Suppression in two multi-constraint safety domains, including an autonomous driving domain where any incorrect behavior can lead to disastrous consequences. Empirically, we demonstrate that our proposed method, when combined with existing safe RL algorithms, can match the task reward achieved by our baselines with significantly fewer constraint violations.
( 2
min )
arXiv:2402.16683v1 Announce Type: cross
Abstract: Imaging is the process of transforming noisy, incomplete data into a space that humans can interpret. NIFTy is a Bayesian framework for imaging and has already successfully been applied to many fields in astrophysics. Previous design decisions held the performance and the development of methods in NIFTy back. We present a rewrite of NIFTy, coined NIFTy.re, which reworks the modeling principle, extends the inference strategies, and outsources much of the heavy lifting to JAX. The rewrite dramatically accelerates models written in NIFTy, lays the foundation for new types of inference machineries, improves maintainability, and enables interoperability between NIFTy and the JAX machine learning ecosystem.
( 2
min )
By breaking an intractable problem into smaller chunks, a deep-learning technique identifies the optimal areas for thinning out traffic in a warehouse.
( 6
min )
This is a guest post written by Axfood AB. In this post, we share how Axfood, a large Swedish food retailer, improved operations and scalability of their existing artificial intelligence (AI) and machine learning (ML) operations by prototyping in close collaboration with AWS experts and using Amazon SageMaker. Axfood is Sweden’s second largest food retailer, […]
( 10
min )
Using LLMs to create structured graphs of image descriptors can enhance the images generated by visual language models. Learn how structured knowledge can improve prompt tuning for both visual and language comprehension.
The post Structured knowledge from LLMs improves prompt learning for visual language models appeared first on Microsoft Research.
( 9
min )
The spirit of software pioneer Grace Hopper will live on at NVIDIA GTC. Accelerated systems using powerful processors — named in honor of the pioneer of software programming — will be on display at the global AI conference running March 18-21, ready to take computing to the next level. System makers will show more than
Read Article
( 6
min )
Editor’s note: This post is a part of our Meet the Omnivore series, which features individual creators and developers who use OpenUSD to build tools, applications and services for 3D workflows and physically accurate virtual worlds. A failed furniture-shopping trip turned into a business idea for Steven Gay, cofounder and CEO of company Mode Maison.
Read Article
( 7
min )
arXiv:2402.15359v1 Announce Type: cross
Abstract: We present the Streaming Gaussian Dirichlet Random Field (S-GDRF) model, a novel approach for modeling a stream of spatiotemporally distributed, sparse, high-dimensional categorical observations. The proposed approach efficiently learns global and local patterns in spatiotemporal data, allowing for fast inference and querying with a bounded time complexity. Using a high-resolution data series of plankton images classified with a neural network, we demonstrate the ability of the approach to make more accurate predictions compared to a Variational Gaussian Process (VGP), and to learn a predictive distribution of observations from streaming categorical data. S-GDRFs open the door to enabling efficient informative path planning over high-dimensional categorical observations, which until now has not been feasible.
( 2
min )
arXiv:2402.15239v1 Announce Type: cross
Abstract: The automated segmentation of cerebral aneurysms is pivotal for accurate diagnosis and treatment planning. Confronted with significant domain shifts and class imbalance in 3D Rotational Angiography (3DRA) data from various medical institutions, the task becomes challenging. These shifts include differences in image appearance, intensity distribution, resolution, and aneurysm size, all of which complicate the segmentation process. To tackle these issues, we propose a novel domain generalization strategy that employs gradient surgery exponential moving average (GS-EMA) optimization technique coupled with boundary-aware contrastive learning (BACL). Our approach is distinct in its ability to adapt to new, unseen domains by learning domain-invariant features, thereby improving the robustness and accuracy of aneurysm segmentation across diverse clinical datasets. The results demonstrate that our proposed approach can extract more domain-invariant features, minimizing over-segmentation and capturing more complete aneurysm structures.
( 2
min )
arXiv:2402.15237v1 Announce Type: cross
Abstract: Unsupervised domain adaptation (UDA) aims to align the labelled source distribution with the unlabelled target distribution to obtain domain-invariant predictive models. Since cross-modality medical data exhibit significant intra and inter-domain shifts and most are unlabelled, UDA is more important while challenging in medical image analysis. This paper proposes a simple yet potent contrastive learning framework for UDA to narrow the inter-domain gap between labelled source and unlabelled target distribution. Our method is validated on cerebral vessel datasets. Experimental results show that our approach can learn latent features from labelled 3DRA modality data and improve vessel segmentation performance in unlabelled MRA modality data.
( 2
min )
arXiv:2310.16119v2 Announce Type: replace
Abstract: We present our SocialBot -- Alquist~5.0 -- developed for the Alexa Prize SocialBot Grand Challenge~5. Building upon previous versions of our system, we introduce the NRG Barista and outline several innovative approaches for integrating Barista into our SocialBot, improving the overall conversational experience. Additionally, we extend our SocialBot to support multimodal devices. This paper offers insights into the development of Alquist~5.0, which meets evolving user expectations while maintaining empathetic and knowledgeable conversational abilities across diverse topics.
( 2
min )
arXiv:2402.15288v1 Announce Type: cross
Abstract: In this work, we present a high-throughput field programmable gate array (FPGA) demonstrator of an artificial neural network (ANN)-based equalizer. The equalization is performed and illustrated in real-time for a 30 GBd, two-level pulse amplitude modulation (PAM2) optical communication system.
( 2
min )
arXiv:2402.15141v1 Announce Type: cross
Abstract: Perturbation and operator adjoint method are used to give the right adjoint form rigourously. From the derivation, we can have following results: 1) The loss gradient is not an ODE, it is an integral and we shows the reason; 2) The traditional adjoint form is not equivalent with the back propagation results. 3) The adjoint operator analysis shows that if and only if the discrete adjoint has the same scheme with the discrete neural ODE, the adjoint form would give the same results as BP does.
( 2
min )
arXiv:2312.01957v2 Announce Type: replace-cross
Abstract: This paper proposes an interpretation of RLAIF as Bayesian inference by introducing distilled Self-Critique (dSC), which refines the outputs of a LLM through a Gibbs sampler that is later distilled into a fine-tuned model. Only requiring synthetic data, dSC is exercised in experiments regarding safety, sentiment, and privacy control, showing it can be a viable and cheap alternative to align LLMs. Code released at \url{https://github.com/vicgalle/distilled-self-critique}.
( 2
min )
arXiv:2402.14980v1 Announce Type: cross
Abstract: Rapid advancements in genome sequencing have led to the collection of vast amounts of genomics data. Researchers may be interested in using machine learning models on such data to predict the pathogenicity or clinical significance of a genetic mutation. However, many genetic datasets contain imbalanced target variables that pose challenges to machine learning models: observations are skewed/imbalanced in regression tasks or class-imbalanced in classification tasks. Genetic datasets are also often high-cardinal and contain skewed predictor variables, which poses further challenges. We aimed to investigate the effects of data preprocessing, feature selection techniques, and model selection on the performance of models trained on these datasets. We measured performance with 5-fold cross-validation and compared averaged r-squared and accuracy metrics across different combinations of techniques. We found that outliers/skew in predictor or target variables did not pose a challenge to regression models. We also found that class-imbalanced target variables and skewed predictors had little to no impact on classification performance. Random forest was the best model to use for imbalanced regression tasks. While our study uses a genetic dataset as an example of a real-world application, our findings can be generalized to any similar datasets.
( 3
min )
arXiv:2402.14982v1 Announce Type: cross
Abstract: In this paper we study the variations in human brain activity when listening to real and fake audio. Our preliminary results suggest that the representations learned by a state-of-the-art deepfake audio detection algorithm, do not exhibit clear distinct patterns between real and fake audio. In contrast, human brain activity, as measured by EEG, displays distinct patterns when individuals are exposed to fake versus real audio. This preliminary evidence enables future research directions in areas such as deepfake audio detection.
( 2
min )
arXiv:2402.14844v1 Announce Type: cross
Abstract: In this paper, we explore a novel combination of supervised learning and quadratic programming to refine dynamic pricing models in the car rental industry. We utilize dynamic modeling of price elasticity, informed by ordinary least squares (OLS) metrics such as p-values, homoscedasticity, error normality. These metrics, when their underlying assumptions hold, are integral in guiding a quadratic programming agent. The program is tasked with optimizing margin for a given finite set target.
( 2
min )
arXiv:2402.14838v1 Announce Type: cross
Abstract: Nowadays, the usage of Large Language Models (LLMs) has increased, and LLMs have been used to generate texts in different languages and for different tasks. Additionally, due to the participation of remarkable companies such as Google and OpenAI, LLMs are now more accessible, and people can easily use them. However, an important issue is how we can detect AI-generated texts from human-written ones. In this article, we have investigated the problem of AI-generated text detection from two different aspects: semantics and syntax. Finally, we presented an AI model that can distinguish AI-generated texts from human-written ones with high accuracy on both multilingual and monolingual tasks using the M4 dataset. According to our results, using a semantic approach would be more helpful for detection. However, there is a lot of room for improvement in the syntactic approach, and it would be a good approach for future work.
( 2
min )
arXiv:2402.15163v1 Announce Type: new
Abstract: This paper presents the first systematic study of the evaluation of Deep Neural Networks (DNNs) for discrete dynamical systems under stochastic assumptions, with a focus on wildfire prediction. We develop a framework to study the impact of stochasticity on two classes of evaluation metrics: classification-based metrics, which assess fidelity to observed ground truth (GT), and proper scoring rules, which test fidelity-to-statistic. Our findings reveal that evaluating for fidelity-to-statistic is a reliable alternative in highly stochastic scenarios. We extend our analysis to real-world wildfire data, highlighting limitations in traditional wildfire prediction evaluation methods, and suggest interpretable stochasticity-compatible alternatives.
( 2
min )
arXiv:2402.14980v1 Announce Type: cross
Abstract: Rapid advancements in genome sequencing have led to the collection of vast amounts of genomics data. Researchers may be interested in using machine learning models on such data to predict the pathogenicity or clinical significance of a genetic mutation. However, many genetic datasets contain imbalanced target variables that pose challenges to machine learning models: observations are skewed/imbalanced in regression tasks or class-imbalanced in classification tasks. Genetic datasets are also often high-cardinal and contain skewed predictor variables, which poses further challenges. We aimed to investigate the effects of data preprocessing, feature selection techniques, and model selection on the performance of models trained on these datasets. We measured performance with 5-fold cross-validation and compared averaged r-squared and accuracy metrics across different combinations of techniques. We found that outliers/skew in predictor or target variables did not pose a challenge to regression models. We also found that class-imbalanced target variables and skewed predictors had little to no impact on classification performance. Random forest was the best model to use for imbalanced regression tasks. While our study uses a genetic dataset as an example of a real-world application, our findings can be generalized to any similar datasets.
( 3
min )
Alumni-founded Pienso has developed a user-friendly AI builder so domain experts can build solutions without writing any code.
( 6
min )
Large Language Models (LLMs) have revolutionized the field of natural language processing (NLP), improving tasks such as language translation, text summarization, and sentiment analysis. However, as these models continue to grow in size and complexity, monitoring their performance and behavior has become increasingly challenging. Monitoring the performance and behavior of LLMs is a critical task […]
( 7
min )
Experienced data scientists will find it helpful to think of hybrid cloud environments as a kind of high-tech ecosystem—complex and full of pitfalls that could swallow you whole if you’re not careful. In this context, keeping tabs on key metrics isn’t just helpful; it’s your secret to making sure everything runs smoother than ever. Here… Read More »5 critical metrics every data scientist should monitor in hybrid cloud environments
The post 5 critical metrics every data scientist should monitor in hybrid cloud environments appeared first on Data Science Central.
( 21
min )
With generative AI and hybrid work environments becoming the new standard, nearly every professional, whether a content creator, researcher or engineer, needs a powerful, AI-accelerated laptop to help users tackle their industry’s toughest challenges — even on the go. The new NVIDIA RTX 500 and 1000 Ada Generation Laptop GPUs will be available in new,
Read Article
( 6
min )
arXiv:2307.09312v4 Announce Type: replace-cross
Abstract: We present the Multi-Modal Discussion Transformer (mDT), a novel methodfor detecting hate speech in online social networks such as Reddit discussions. In contrast to traditional comment-only methods, our approach to labelling a comment as hate speech involves a holistic analysis of text and images grounded in the discussion context. This is done by leveraging graph transformers to capture the contextual relationships in the discussion surrounding a comment and grounding the interwoven fusion layers that combine text and image embeddings instead of processing modalities separately. To evaluate our work, we present a new dataset, HatefulDiscussions, comprising complete multi-modal discussions from multiple online communities on Reddit. We compare the performance of our model to baselines that only process individual comments and conduct extensive ablation studies.
( 2
min )
arXiv:2402.14095v1 Announce Type: cross
Abstract: Generalization to unseen data is a key desideratum for deep networks, but its relation to classification accuracy is unclear. Using a minimalist vision dataset and a measure of generalizability, we show that popular networks, from deep convolutional networks (CNNs) to transformers, vary in their power to extrapolate to unseen classes both across layers and across architectures. Accuracy is not a good predictor of generalizability, and generalization varies non-monotonically with layer depth. Code is available at https://github.com/dyballa/zero-shot-generalization.
( 2
min )
arXiv:2402.14578v1 Announce Type: cross
Abstract: In this paper, we consider a deterministic online linear regression model where we allow the responses to be multivariate. To address this problem, we introduce MultiVAW, a method that extends the well-known Vovk-Azoury-Warmuth algorithm to the multivariate setting, and show that it also enjoys logarithmic regret in time. We apply our results to the online hierarchical forecasting problem and recover an algorithm from this literature as a special case, allowing us to relax the hypotheses usually made for its analysis.
( 2
min )
arXiv:2402.14031v1 Announce Type: cross
Abstract: This paper presents a novel autoencoder with ordered variance (AEO) in which the loss function is modified with a variance regularization term to enforce order in the latent space. Further, the autoencoder is modified using ResNets, which results in a ResNet AEO (RAEO). The paper also illustrates the effectiveness of AEO and RAEO in extracting nonlinear relationships among input variables in an unsupervised setting.
( 2
min )
arXiv:2402.14759v1 Announce Type: new
Abstract: The purpose of this paper is to look into how central notions in statistical learning theory, such as realisability, generalise under the assumption that train and test distribution are issued from the same credal set, i.e., a convex set of probability distributions. This can be considered as a first step towards a more general treatment of statistical learning under epistemic uncertainty.
( 2
min )
arXiv:2402.14646v1 Announce Type: new
Abstract: This work introduces reduced models based on Continuous Low Rank Adaptation (CoLoRA) that pre-train neural networks for a given partial differential equation and then continuously adapt low-rank weights in time to rapidly predict the evolution of solution fields at new physics parameters and new initial conditions. The adaptation can be either purely data-driven or via an equation-driven variational approach that provides Galerkin-optimal approximations. Because CoLoRA approximates solution fields locally in time, the rank of the weights can be kept small, which means that only few training trajectories are required offline so that CoLoRA is well suited for data-scarce regimes. Predictions with CoLoRA are orders of magnitude faster than with classical methods and their accuracy and parameter efficiency is higher compared to other neural network approaches.
( 2
min )
arXiv:2402.14532v1 Announce Type: new
Abstract: Obtaining heteroscedastic predictive uncertainties from a Bayesian Neural Network (BNN) is vital to many applications. Often, heteroscedastic aleatoric uncertainties are learned as outputs of the BNN in addition to the predictive means, however doing so may necessitate adding more learnable parameters to the network. In this work, we demonstrate that both the heteroscedastic aleatoric and epistemic variance can be embedded into the variances of learned BNN parameters, improving predictive performance for lightweight networks. By complementing this approach with a moment propagation approach to inference, we introduce a relatively simple framework for sampling-free variational inference suitable for lightweight BNNs.
( 2
min )
arXiv:2402.14481v1 Announce Type: new
Abstract: We introduce the concept of Automated Causal Discovery (AutoCD), defined as any system that aims to fully automate the application of causal discovery and causal reasoning methods. AutoCD's goal is to deliver all causal information that an expert human analyst would and answer a user's causal queries. We describe the architecture of such a platform, and illustrate its performance on synthetic data sets. As a case study, we apply it on temporal telecommunication data. The system is general and can be applied to a plethora of causal discovery problems.
( 2
min )
arXiv:2402.14385v1 Announce Type: new
Abstract: Achieving net zero carbon emissions by 2050 requires the integration of increasing amounts of wind power into power grids. This energy source poses a challenge to system operators due to its variability and uncertainty. Therefore, accurate forecasting of wind power is critical for grid operation and system balancing. This paper presents an innovative approach to short-term (1 to 6 hour horizon) windpower forecasting at a national level. The method leverages Automated Deep Learning combined with Numerical Weather Predictions wind speed maps to accurately forecast wind power.
( 2
min )
arXiv:2402.14384v1 Announce Type: new
Abstract: In this paper, we employ a 1D deep convolutional generative adversarial network (DCGAN) for sequential anomaly detection in energy time series data. Anomaly detection involves gradient descent to reconstruct energy sub-sequences, identifying the noise vector that closely generates them through the generator network. Soft-DTW is used as a differentiable alternative for the reconstruction loss and is found to be superior to Euclidean distance. Combining reconstruction loss and the latent space's prior probability distribution serves as the anomaly score. Our novel method accelerates detection by parallel computation of reconstruction of multiple points and shows promise in identifying anomalous energy consumption in buildings, as evidenced by performing experiments on hourly energy time series from 15 buildings.
( 2
min )
arXiv:2402.14080v1 Announce Type: new
Abstract: Deep learning models are being adopted and applied on various critical decision-making tasks, yet they are trained to provide point predictions without providing degrees of confidence. The trustworthiness of deep learning models can be increased if paired with uncertainty estimations. Conformal Prediction has emerged as a promising method to pair machine learning models with prediction intervals, allowing for a view of the model's uncertainty. However, popular uncertainty estimation methods for conformal prediction fail to provide heteroskedastic intervals that are equally accurate for all samples. In this paper, we propose a method to estimate the uncertainty of each sample by calculating the variance obtained from a Deep Regression Forest. We show that the deep regression forest variance improves the efficiency and coverage of normalized inductive conformal prediction on a drug response prediction task.
( 2
min )
arXiv:2402.14385v1 Announce Type: cross
Abstract: Achieving net zero carbon emissions by 2050 requires the integration of increasing amounts of wind power into power grids. This energy source poses a challenge to system operators due to its variability and uncertainty. Therefore, accurate forecasting of wind power is critical for grid operation and system balancing. This paper presents an innovative approach to short-term (1 to 6 hour horizon) windpower forecasting at a national level. The method leverages Automated Deep Learning combined with Numerical Weather Predictions wind speed maps to accurately forecast wind power.
( 2
min )
arXiv:2402.14080v1 Announce Type: cross
Abstract: Deep learning models are being adopted and applied on various critical decision-making tasks, yet they are trained to provide point predictions without providing degrees of confidence. The trustworthiness of deep learning models can be increased if paired with uncertainty estimations. Conformal Prediction has emerged as a promising method to pair machine learning models with prediction intervals, allowing for a view of the model's uncertainty. However, popular uncertainty estimation methods for conformal prediction fail to provide heteroskedastic intervals that are equally accurate for all samples. In this paper, we propose a method to estimate the uncertainty of each sample by calculating the variance obtained from a Deep Regression Forest. We show that the deep regression forest variance improves the efficiency and coverage of normalized inductive conformal prediction on a drug response prediction task.
( 2
min )
arXiv:2402.14646v1 Announce Type: cross
Abstract: This work introduces reduced models based on Continuous Low Rank Adaptation (CoLoRA) that pre-train neural networks for a given partial differential equation and then continuously adapt low-rank weights in time to rapidly predict the evolution of solution fields at new physics parameters and new initial conditions. The adaptation can be either purely data-driven or via an equation-driven variational approach that provides Galerkin-optimal approximations. Because CoLoRA approximates solution fields locally in time, the rank of the weights can be kept small, which means that only few training trajectories are required offline so that CoLoRA is well suited for data-scarce regimes. Predictions with CoLoRA are orders of magnitude faster than with classical methods and their accuracy and parameter efficiency is higher compared to other neural network approaches.
( 2
min )
arXiv:2402.14532v1 Announce Type: cross
Abstract: Obtaining heteroscedastic predictive uncertainties from a Bayesian Neural Network (BNN) is vital to many applications. Often, heteroscedastic aleatoric uncertainties are learned as outputs of the BNN in addition to the predictive means, however doing so may necessitate adding more learnable parameters to the network. In this work, we demonstrate that both the heteroscedastic aleatoric and epistemic variance can be embedded into the variances of learned BNN parameters, improving predictive performance for lightweight networks. By complementing this approach with a moment propagation approach to inference, we introduce a relatively simple framework for sampling-free variational inference suitable for lightweight BNNs.
( 2
min )
arXiv:2402.14578v1 Announce Type: new
Abstract: In this paper, we consider a deterministic online linear regression model where we allow the responses to be multivariate. To address this problem, we introduce MultiVAW, a method that extends the well-known Vovk-Azoury-Warmuth algorithm to the multivariate setting, and show that it also enjoys logarithmic regret in time. We apply our results to the online hierarchical forecasting problem and recover an algorithm from this literature as a special case, allowing us to relax the hypotheses usually made for its analysis.
( 2
min )
In 2024, companies all around the world are on a relentless quest for innovative solutions to leverage vast amounts of information and elevate their interactions. In this quest, Natural Language Processing (NLP) emerges as a groundbreaking area of artificial intelligence, seamlessly connecting human communication with machine interpretation. NLP is transforming business practices, data analysis, and… Read More »What are the benefits of using Natural Language Processing (NLP) in Business?
The post What are the benefits of using Natural Language Processing (NLP) in Business? appeared first on Data Science Central.
( 22
min )
The momentum around Generative AI in MarTech is undeniable. A staggering 63% of marketing leaders are signaling their intention to invest in this innovative technology in the future. This isn’t a vision for the distant future—it’s already here. It is reshaping our present, unlocking new avenues for innovation and offering solutions that were once thought… Read More »MarTech Trends: Capitalizing on the rising GenAI excitement for business ROI growth
The post MarTech Trends: Capitalizing on the rising GenAI excitement for business ROI growth appeared first on Data Science Central.
( 23
min )
Businesses today heavily rely on data, but many struggle with an overwhelming amount of information that doesn’t provide the necessary insights. Important questions often go unanswered as data is scattered across various files and slow reports. This challenge makes decision-making difficult, opportunities are missed, and progress stalls. Have you ever felt this way? The problem… Read More »SQL Server: Powering your data warehouse with insights and efficiency
The post SQL Server: Powering your data warehouse with insights and efficiency appeared first on Data Science Central.
( 22
min )
Sponsored Post Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […]
The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.
( 10
min )